Need to parse a HTML table into an Array/hash

ernsttanaka · November 30, 2007, 10:24pm

I am new to ruby; having a lot of fun, but currently I am stuck with the
following.

I want to read this page

Into an array or hash.

I found the gem yahoo-finance; but that does not what I need.
I need to find the Symbol for (in this example)
AAPL call option strike 95 Dec-07.

I am able to read the page:
conn = open(“Apple Inc. (AAPL) Options Chain - Yahoo Finance”)
lines = conn.read

The main question is how to I parse the table which is in the middle of
the HTML.

Any help is appreciated!!

ernsttanaka · November 30, 2007, 10:27pm

I am able to read the page:
conn = open(“Apple Inc. (AAPL) Options Chain - Yahoo Finance”)
lines = conn.read

The main question is how to I parse the table which is in the
middle of the HTML.

You probably want to look at mechanize and hpricot, or maybe scrubyt.

donald

ernsttanaka · November 30, 2007, 10:39pm

On Nov 30, 10:27 pm, “Ball, Donald A Jr (Library)”
[email protected] wrote:

I am able to read the page:
conn = open(“Apple Inc. (AAPL) Options Chain - Yahoo Finance”)
lines = conn.read

The main question is how to I parse the table which is in the
middle of the HTML.

You probably want to look at mechanize and hpricot, or maybe scrubyt.

donald

I use hpricot for the similar task.

ernsttanaka · December 1, 2007, 12:48am

I’m kind of new too, so this might be a little sloppy by more
experienced
standards, but I think this gets you partially there.

require “open-uri”
conn = open(“Apple Inc. (AAPL) Options Chain - Yahoo Finance”).read
table_data = conn.scan /<td.+?td>/
table_data_refined = []
table_data.each { |data|
data.gsub!(/<.+?>/, “”)
table_data_refined << data
}

You probably need to tweak the regexes to isolate the specific table you
want (like plugging in “class=xxx”).

I hope that helps.

ernsttanaka · December 1, 2007, 10:08am

On Nov 30, 2007, at 10:24 PM, Ernst T. wrote:

I am new to ruby; having a lot of fun, but currently I am stuck with
the
following.

I want to read this page

Apple Inc. (AAPL) Options Chain - Yahoo Finance

scRUBYt! is especially well suited for this type of task.

Watch this:

==================================================
require ‘rubygems’
require ‘scrubyt’

Scrubyt.logger = Scrubyt::Logger.new
yahoo_data = Scrubyt::Extractor.define do

fetch ’ Apple Inc. (AAPL) Options Chain - Yahoo Finance’

record do
expires ‘Dec 07’
symbol ‘QAALS.X’
last ‘86.60’
chg ‘Down 3.00’
bid ‘87.10’
ask ‘87.50’
vol ‘4’
open_int ‘146’
end.select_indices(1…6)
end

p yahoo_data.to_hash
yahoo_data.export(FILE)

If you want to get just the first 3 results, change select_indices(1…3)

output:

[{:chg=>“Dec 07QAALS.X86.60 3.0087.1087.504146”,
:bid=>“87.10”, :ask=>“87.50”, :vol=>“4”, :last=>“86.60”,
:expires=>“Dec 07”, :open_int=>“146”, :symbol=>" QAALS.X"},
{:chg=>“Jan 08QAAAS.X88.10 0.9587.4587.901714,832”,
:bid=>“87.45”, :ask=>“87.90”, :vol=>“17”, :last=>“88.10”,
:expires=>“Jan 08”, :open_int=>“14,832”, :symbol=>" QAAAS.X"},
{:chg=>“Apr 08QAADS.X88.35 2.1089.0089.4511973”,
:bid=>“89.00”, :ask=>“89.45”, :vol=>“11”, :last=>“88.35”,
:expires=>“Apr 08”, :open_int=>“973”, :symbol=>" QAADS.X"},
{:chg=>“ExpiresSymbolLastChgBidAskVolOpen Int”, :bid=>“Bid”,
:ask=>“Ask”, :vol=>“Vol”, :open_int=>“Open Int”},
{:chg=>“Dec 07QAAXS.X0.01 0.00N/A0.034128”, :bid=>“N/A”,
:ask=>“0.03”, :vol=>“4”, :last=>“0.01”, :expires=>“Dec 07”,
:open_int=>“128”, :symbol=>“QAAXS.X”},
{:chg=>“Jan 08QAAMS.X0.10 0.040.070.112,38619,345”,
:bid=>“0.07”, :ask=>“0.11”, :vol=>“2,386”, :last=>“0.10”,
:expires=>“Jan 08”, :open_int=>“19,345”, :symbol=>" QAAMS.X"}]

The very last line also creates a second file, which looks like this:

==================================================
require ‘rubygems’
require ‘scrubyt’

yahoo_data = Scrubyt::Extractor.define do
fetch(" Apple Inc. (AAPL) Options Chain - Yahoo Finance")

record(“/html/body/div/div/table/tr/td/table/tr/td/table/tr”,
{ :generalize => true }) do
expires(“/td[1]/b[1]/a[1]”)
symbol(“/td[2]/a[1]”)
last(“/td[3]/b[1]”)
chg()
bid(“/td[5]”)
ask(“/td[6]”)
vol(“/td[7]”)
open_int(“/td[8]”)
end.select_indices((1…6))
end

yahoo_data.to_xml.write($stdout, 1)

which obviously works whatever the content of the page is.

There is much more in sRUBYt!, this was just a very basic example, so
if you need some aditional tweaking, just drop me a mail.

HTH,
Peter

http://www.rubyrailways.com
http://scrubyt.org

ernsttanaka · December 5, 2007, 7:46pm

I should say, even if a bit late, that Yahoo! Finance provides data in
CSV
format – you can specify a start and end date, and it gives you the
closings for each day in the range. It’s easier than screen scraping.

Historical data can be obtained with the following URL:

http://ichart.finance.yahoo.com/table.csv?&s=[quote name]&a=[start
month]&b=[start_day]&c=[start_year]&d=[end_month]&e=[end_day]&f=[end_year]&g=d&ignore=.csv

Substitute each bracketed info with the corresponding number. Leave the
brackts out, of course.

You can get data for the current quote in:

http://finance.yahoo.com/d/quotes.csv?s=[quote
name]&f=sl1d1t1c1ohgv&e=.csv

Just put in the name of the thingie and you’ll be fine.

-Vitor

ernsttanaka · December 3, 2007, 1:22am

Peter S. wrote:

Peter:

Thanks for your help.

Your method works nice and easy.

Although installing of scrubyt took a little effort.
I encountered some errors // but as adviced on a this website
http://agora.scrubyt.org/forums/3/topics/60?page=2
I made a comment of require â€˜parse_tree_reloadedâ€™

Thanks to everyone; the help I get on this forum makes programming Ruby
a very pleasant experience.

Ernst

ernsttanaka · December 5, 2007, 11:34pm

On Dec 5, 2007 6:46 PM, Ernst T. [email protected] wrote:

Thank you for your response.

Ernst

Oh, I’m sorry. I seem to have lacked in attention when reading your
original
message.

ernsttanaka · December 5, 2007, 9:46pm

Thanks Vitor; But for that solution you need to know the Symbol.
My problem is I have the symbol of the underlying and the Strike of the
option.
I don’t have the symbol for the option.
That why I need to parse the option page to get the symbol and then to
move to that cvs file.

Thank you for your response.

Ernst

Need to parse a HTML table into an Array/hash

p yahoo_data.to_hash yahoo_data.export(FILE)

output:

yahoo_data.to_xml.write($stdout, 1)

p yahoo_data.to_hash
yahoo_data.export(FILE)