there is a html file:
report_data |
liu_asset |
cash |
finance_asset |
note |
2009-12-31 |
0 |
1,693,048,000,000 |
20,147,000,000 |
500 |
2009-09-30 |
0 |
1,777,512,000,000 |
24,977,000,000 |
700 |
how can i use an array to load data with ruby,here is what i want :
array[0,0]=report_data
array[0,1]=liu_asset
array[0,2]=cash
array[0,3]=finance_asset
array[0,4]=note
array[1,0]=2009-12-31
array[1,1]=0
array[1,2]=1,693,048,000,000
array[1,3]=20,147,000,000
array[1,4]=500
array[2,0]=2009-09-30
array[2,1]=0
array[2,2]=1,777,512,000,000
array[2,3]=24,977,000,000
array[2,4]=700
On Mon, Apr 5, 2010 at 10:33 PM, Pen T. [email protected] wrote:
2009-12-31
24,977,000,000
array[1,0]=2009-12-31
Posted via http://www.ruby-forum.com/.
You’re missing a after the first row.
I added it in for you, and gave an example of how you can use Hpricot to
load the data. Here is basically everything I know about Hpricot: you
can
get the text inside an Hpricot element by passing it #innerHTML, if it
is an
element, you can get a list of elements with a specific tag by using
division, for example how I got all the td’s from the row. You can get
the
first instance of a specific tag by using %, for example how I pull out
the
strong.
That is all I know about Hpricot, but it’s all a problem like this
requires.
require ‘rubygems’
require ‘hpricot’
rows = Array.new
for row in Hpricot(DATA) % ‘table’ / ‘tr’
rows.push Array.new
for data in row / ‘td’
rows.last.push ( data % ‘strong’ || data ).innerHTML
end
end
require ‘pp’
pp rows
END
report_data |
liu_asset |
cash |
finance_asset |
note |
2009-12-31 |
0 |
1,693,048,000,000 |
20,147,000,000 |
500 |
2009-09-30 |
0 |
1,777,512,000,000 |
24,977,000,000 |
700 |
Here’s a similar way to do what Josh does, only using Nokogiri:
#!/usr/bin/ruby
require ‘rubygems’
require ‘nokogiri’
HTML =<<EOT
report_data |
liu_asset |
cash |
finance_asset |
note |
2009-12-31 |
0 |
1,693,048,000,000 |
20,147,000,000 |
500 |
2009-09-30 |
0 |
1,777,512,000,000 |
24,977,000,000 |
700 |
EOT
doc = Nokogiri::HTML.parse(HTML)
array = []
doc.css(‘tr’).each_with_index do |tr, tr_i|
array[tr_i] = tr.css(‘td’).map{ |td| td.text }
end
array[0][0] # => “report_data”
array[0][1] # => “liu_asset”
array[0][2] # => " cash"
array[0][3] # => “finance_asset”
array[0][4] # => " note"
array[1][0] # => “2009-12-31”
array[1][1] # => "0 "
array[1][2] # => “1,693,048,000,000”
array[1][3] # => “20,147,000,000”
array[1][4] # => “500”
array[2][0] # => “2009-09-30”
array[2][1] # => "0 "
array[2][2] # => “1,777,512,000,000”
array[2][3] # => “24,977,000,000”
array[2][4] # => “700”