Need to parse a HTML table into an Array/hash

I am new to ruby; having a lot of fun, but currently I am stuck with the
following.

I want to read this page

http://finance.yahoo.com/q/op?s=AAPL&k=95.00&m=2007-12

Into an array or hash.

I found the gem yahoo-finance; but that does not what I need.
I need to find the Symbol for (in this example)
AAPL call option strike 95 Dec-07.

I am able to read the page:
conn = open(“http://finance.yahoo.com/q/op?s=AAPL&m=2008-04”)
lines = conn.read

The main question is how to I parse the table which is in the middle of
the HTML.

Any help is appreciated!!

I am able to read the page:
conn = open(“http://finance.yahoo.com/q/op?s=AAPL&m=2008-04”)
lines = conn.read

The main question is how to I parse the table which is in the
middle of the HTML.

You probably want to look at mechanize and hpricot, or maybe scrubyt.

  • donald

On Nov 30, 10:27 pm, “Ball, Donald A Jr (Library)”
[email protected] wrote:

I am able to read the page:
conn = open(“http://finance.yahoo.com/q/op?s=AAPL&m=2008-04”)
lines = conn.read

The main question is how to I parse the table which is in the
middle of the HTML.

You probably want to look at mechanize and hpricot, or maybe scrubyt.

  • donald

I use hpricot for the similar task.

I’m kind of new too, so this might be a little sloppy by more
experienced
standards, but I think this gets you partially there.

require “open-uri”
conn = open(“http://finance.yahoo.com/q/op?s=AAPL&m=2008-04”).read
table_data = conn.scan /<td.+?td>/
table_data_refined = []
table_data.each { |data|
data.gsub!(/<.+?>/, “”)
table_data_refined << data
}

You probably need to tweak the regexes to isolate the specific table you
want (like plugging in “class=xxx”).

I hope that helps.

On Nov 30, 2007, at 10:24 PM, Ernst T. wrote:

I am new to ruby; having a lot of fun, but currently I am stuck with
the
following.

I want to read this page

http://finance.yahoo.com/q/op?s=AAPL&k=95.00&m=2007-12

scRUBYt! is especially well suited for this type of task.

Watch this:

==================================================
require ‘rubygems’
require ‘scrubyt’

Scrubyt.logger = Scrubyt::Logger.new
yahoo_data = Scrubyt::Extractor.define do

fetch ’ http://finance.yahoo.com/q/op?s=AAPL&k=95.00&m=2007-12

record do
expires ‘Dec 07’
symbol ‘QAALS.X’
last ‘86.60’
chg ‘Down 3.00’
bid ‘87.10’
ask ‘87.50’
vol ‘4’
open_int ‘146’
end.select_indices(1…6)
end

p yahoo_data.to_hash
yahoo_data.export(FILE)

If you want to get just the first 3 results, change select_indices(1…3)

output:

[{:chg=>“Dec 07QAALS.X86.60 3.0087.1087.504146”,
:bid=>“87.10”, :ask=>“87.50”, :vol=>“4”, :last=>“86.60”,
:expires=>“Dec 07”, :open_int=>“146”, :symbol=>" QAALS.X"},
{:chg=>“Jan 08QAAAS.X88.10 0.9587.4587.901714,832”,
:bid=>“87.45”, :ask=>“87.90”, :vol=>“17”, :last=>“88.10”,
:expires=>“Jan 08”, :open_int=>“14,832”, :symbol=>" QAAAS.X"},
{:chg=>“Apr 08QAADS.X88.35 2.1089.0089.4511973”,
:bid=>“89.00”, :ask=>“89.45”, :vol=>“11”, :last=>“88.35”,
:expires=>“Apr 08”, :open_int=>“973”, :symbol=>" QAADS.X"},
{:chg=>“ExpiresSymbolLastChgBidAskVolOpen Int”, :bid=>“Bid”,
:ask=>“Ask”, :vol=>“Vol”, :open_int=>“Open Int”},
{:chg=>“Dec 07QAAXS.X0.01 0.00N/A0.034128”, :bid=>“N/A”,
:ask=>“0.03”, :vol=>“4”, :last=>“0.01”, :expires=>“Dec 07”,
:open_int=>“128”, :symbol=>“QAAXS.X”},
{:chg=>“Jan 08QAAMS.X0.10 0.040.070.112,38619,345”,
:bid=>“0.07”, :ask=>“0.11”, :vol=>“2,386”, :last=>“0.10”,
:expires=>“Jan 08”, :open_int=>“19,345”, :symbol=>" QAAMS.X"}]

The very last line also creates a second file, which looks like this:

==================================================
require ‘rubygems’
require ‘scrubyt’

yahoo_data = Scrubyt::Extractor.define do
fetch(" http://finance.yahoo.com/q/op?s=AAPL&k=95.00&m=2007-12")

record("/html/body/div/div/table/tr/td/table/tr/td/table/tr",
{ :generalize => true }) do
expires("/td[1]/b[1]/a[1]")
symbol("/td[2]/a[1]")
last("/td[3]/b[1]")
chg()
bid("/td[5]")
ask("/td[6]")
vol("/td[7]")
open_int("/td[8]")
end.select_indices((1…6))
end

yahoo_data.to_xml.write($stdout, 1)

which obviously works whatever the content of the page is.

There is much more in sRUBYt!, this was just a very basic example, so
if you need some aditional tweaking, just drop me a mail.

HTH,
Peter


http://www.rubyrailways.com
http://scrubyt.org

I should say, even if a bit late, that Yahoo! Finance provides data in
CSV
format – you can specify a start and end date, and it gives you the
closings for each day in the range. It’s easier than screen scraping.

Historical data can be obtained with the following URL:

http://ichart.finance.yahoo.com/table.csv?&s=[quote name]&a=[start
month]&b=[start_day]&c=[start_year]&d=[end_month]&e=[end_day]&f=[end_year]&g=d&ignore=.csv

Substitute each bracketed info with the corresponding number. Leave the
brackts out, of course.

You can get data for the current quote in:

http://finance.yahoo.com/d/quotes.csv?s=[quote
name]&f=sl1d1t1c1ohgv&e=.csv

Just put in the name of the thingie and you’ll be fine.

-Vitor

Peter S. wrote:

Peter:

Thanks for your help.

Your method works nice and easy.

Although installing of scrubyt took a little effort.
I encountered some errors // but as adviced on a this website
http://agora.scrubyt.org/forums/3/topics/60?page=2
I made a comment of require ‘parse_tree_reloaded’

Thanks to everyone; the help I get on this forum makes programming Ruby
a very pleasant experience.

Ernst

On Dec 5, 2007 6:46 PM, Ernst T. [email protected] wrote:

Thank you for your response.

Ernst

Oh, I’m sorry. I seem to have lacked in attention when reading your
original
message.

Thanks Vitor; But for that solution you need to know the Symbol.
My problem is I have the symbol of the underlying and the Strike of the
option.
I don’t have the symbol for the option.
That why I need to parse the option page to get the symbol and then to
move to that cvs file.

Thank you for your response.

Ernst

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs