Forum: Ruby Yahoo API and Ruby

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
F27a924357fce415c269953251975f06?d=identicon&s=25 rb (Guest)
on 2007-01-01 22:52
(Received via mailing list)
I'm working on a couple of large sites that aren't sending the correct
response codes for missing pages.  I want to use the Yahoo API key to
search Yahoo's cache to see if it has any clues about what pages are
sending bad responses.  So I need to get all 1000 results from the
Yahoo cache and write it to a spreadsheet.  Then I can sort the URL
data and response codes in the spreadsheet.

I can only request 100 results at a time and can set a different
"start" number for each request.  The first request would be start=1,
the second, start=101 and so on.

The other problem is that it won't get the response codes.  I just get
this unhelpful error message:
c:/ruby/lib/ruby/1.8/net/http.rb:1467:in `initialize': HTTP request
path
y (ArgumentError)
        from c:/ruby/lib/ruby/1.8/net/http.rb:1585:in `initialize'
        from hpricot_test.rb:32:in `new'
        from hpricot_test.rb:32:in `get_headers'
        from hpricot_test.rb:80:in `generate_workbook
        from hpricot_test.rb:70:in `each'
        from hpricot_test.rb:70:in `generate_workbook
        from hpricot_test.rb:94

Here is the code:

#!/usr/bin/ruby -w

require 'net/http'
require 'uri'
require 'hpricot'
require 'spreadsheet/excel'
include Spreadsheet

  def get_cache
    # set variables for POST request
    appid = 'yahooAPI-key' # a Yahoo API key goes here
    query = 'http://www.example.com' # a Web site to check goes here

    # this gets the first 100 results, but I want to loop through
    #  it 10 times with a different "start" number to get all 1000
    #  available results
    results = 100
    start = 1

    post_args = {
      'appid' => appid,
      'query' => query,
      'results' => results,
      'start' => start
    }
    url =
URI.parse('http://search.yahooapis.com/SiteExplorerService/V1...)

    # send post request
    @resp, @data = Net::HTTP.post_form(url, post_args)

    # read XML
    @doc = Hpricot(@data)
  end

  def get_headers(url)
    # This gets the response code for the page to see if it exists
(200, 301, 404, etc.)
    page = URI.parse(url)
    req = Net::HTTP::Get.new(page.path)
    res = Net::HTTP.start(page.host, page.port) { |http|
      http.request(req)
    }
    return res.code
  end

  def generate_workbook
    # create new workbook and worksheet
    workbook = Spreadsheet::Excel.new("yahoo_cache.xls")
    worksheet = workbook.add_worksheet('Yahoo Cache')

    # set variables
    current_row = 2
    format_nil = nil
    format_header = Format.new(
      :color => 'white',
      :bg_color => 'gray',
      :bold  => true
    )
    workbook.add_format(format_header)
    workbook.add_format(format_nil)

    # Add header row
    worksheet.write(0,0,"Yahoo's Cache for Site", format_nil)
    worksheet.write(1,0,"TITLE",format_header)
    worksheet.write(1,1,"URL", format_header)
    # worksheet.write(1,2,"CODE", format_header)
    # worksheet.write(1,3,"LOCATION", format_header) # coming soon

    # Add xml_data to worksheet
    (@doc/"result").each do |el|
      result_title = (el/"title").text
      result_url   = (el/"url").text
      worksheet.write(current_row, 0, result_title, format_nil)
      worksheet.write(current_row, 1, result_url, format_nil)

      # get response codes -- this is causing an error with
"result_url" -- maybe it isn't a URL in a string?
      # see error message at top of this post
      # response_code ||= 0
      # response_code = get_headers(result_url)  # this works if I put
a URL here, but not with the result_url variable
      # worksheet.write(current_row, 2, response_code, format_nil)

      # move to the next row in the spreadsheet before going to the
next XML item
      current_row += 1
    end

    # finished, close the workbook
    workbook.close
  end


====
The above code works (except the part that gets response codes).  The
following code is a previous version where I tried to loop through all
1000 results.  (It was using xmlsimple.)  I couldn't figure out how to
store each set of XML -- each request is an entire XML file.  I tried
@pass[count], but it wasn't working.  Any ideas about a good way to
store each request?

    # prepare to loop through 100 results
    count = 1
    start = 1

    # pass[] = each of the 10 requests to Yahoo
    @pass = []

    # perform the loop
    while count < 11 do
      post_args = {
        'appid' => appid,
        'query' => query,
        'results' => results,
        'start' => start
      }

      # send post request
      @resp, @data = Net::HTTP.post_form(url, post_args)

      # read XML
      xml_data = XmlSimple.xml_in(@data)
      @pass[count] = xml_data
      # puts "Count: #{count}"
      # print @pass[count]

      # puts "Start: #{start}"
      # puts
      count += 1
      start += 100
    end
This topic is locked and can not be replied to.