Forum: Ruby Yahoo API and Ruby

Announcement (2017-05-07): is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see and for other Rails- und Ruby-related community platforms.
rb (Guest)
on 2007-01-01 23:52
(Received via mailing list)
I'm working on a couple of large sites that aren't sending the correct
response codes for missing pages.  I want to use the Yahoo API key to
search Yahoo's cache to see if it has any clues about what pages are
sending bad responses.  So I need to get all 1000 results from the
Yahoo cache and write it to a spreadsheet.  Then I can sort the URL
data and response codes in the spreadsheet.

I can only request 100 results at a time and can set a different
"start" number for each request.  The first request would be start=1,
the second, start=101 and so on.

The other problem is that it won't get the response codes.  I just get
this unhelpful error message:
c:/ruby/lib/ruby/1.8/net/http.rb:1467:in `initialize': HTTP request
y (ArgumentError)
        from c:/ruby/lib/ruby/1.8/net/http.rb:1585:in `initialize'
        from hpricot_test.rb:32:in `new'
        from hpricot_test.rb:32:in `get_headers'
        from hpricot_test.rb:80:in `generate_workbook
        from hpricot_test.rb:70:in `each'
        from hpricot_test.rb:70:in `generate_workbook
        from hpricot_test.rb:94

Here is the code:

#!/usr/bin/ruby -w

require 'net/http'
require 'uri'
require 'hpricot'
require 'spreadsheet/excel'
include Spreadsheet

  def get_cache
    # set variables for POST request
    appid = 'yahooAPI-key' # a Yahoo API key goes here
    query = '' # a Web site to check goes here

    # this gets the first 100 results, but I want to loop through
    #  it 10 times with a different "start" number to get all 1000
    #  available results
    results = 100
    start = 1

    post_args = {
      'appid' => appid,
      'query' => query,
      'results' => results,
      'start' => start
    url =

    # send post request
    @resp, @data = Net::HTTP.post_form(url, post_args)

    # read XML
    @doc = Hpricot(@data)

  def get_headers(url)
    # This gets the response code for the page to see if it exists
(200, 301, 404, etc.)
    page = URI.parse(url)
    req =
    res = Net::HTTP.start(, page.port) { |http|
    return res.code

  def generate_workbook
    # create new workbook and worksheet
    workbook ="yahoo_cache.xls")
    worksheet = workbook.add_worksheet('Yahoo Cache')

    # set variables
    current_row = 2
    format_nil = nil
    format_header =
      :color => 'white',
      :bg_color => 'gray',
      :bold  => true

    # Add header row
    worksheet.write(0,0,"Yahoo's Cache for Site", format_nil)
    worksheet.write(1,1,"URL", format_header)
    # worksheet.write(1,2,"CODE", format_header)
    # worksheet.write(1,3,"LOCATION", format_header) # coming soon

    # Add xml_data to worksheet
    (@doc/"result").each do |el|
      result_title = (el/"title").text
      result_url   = (el/"url").text
      worksheet.write(current_row, 0, result_title, format_nil)
      worksheet.write(current_row, 1, result_url, format_nil)

      # get response codes -- this is causing an error with
"result_url" -- maybe it isn't a URL in a string?
      # see error message at top of this post
      # response_code ||= 0
      # response_code = get_headers(result_url)  # this works if I put
a URL here, but not with the result_url variable
      # worksheet.write(current_row, 2, response_code, format_nil)

      # move to the next row in the spreadsheet before going to the
next XML item
      current_row += 1

    # finished, close the workbook

The above code works (except the part that gets response codes).  The
following code is a previous version where I tried to loop through all
1000 results.  (It was using xmlsimple.)  I couldn't figure out how to
store each set of XML -- each request is an entire XML file.  I tried
@pass[count], but it wasn't working.  Any ideas about a good way to
store each request?

    # prepare to loop through 100 results
    count = 1
    start = 1

    # pass[] = each of the 10 requests to Yahoo
    @pass = []

    # perform the loop
    while count < 11 do
      post_args = {
        'appid' => appid,
        'query' => query,
        'results' => results,
        'start' => start

      # send post request
      @resp, @data = Net::HTTP.post_form(url, post_args)

      # read XML
      xml_data = XmlSimple.xml_in(@data)
      @pass[count] = xml_data
      # puts "Count: #{count}"
      # print @pass[count]

      # puts "Start: #{start}"
      # puts
      count += 1
      start += 100
This topic is locked and can not be replied to.