Hi.
I’ve got a file that contains a table that looks like this:
column title a |
column title b |
row 1 a |
row 1 b |
row 2 a |
row 2 b |
row 3 a |
row 3 b |
row 4 a |
row 4 b |
I need to read the rows starting with the second table row, which
excludes the title of the column. Then I need to read each column’s
value.
How can I loop through each row to get each value that I need?
To read a row of values I used
(doc/"/html/body/table/tbody/tr[2]/td[1]").inner_html >> row 1 a
(doc/"/html/body/table/tbody/tr[2]/td[2]").inner_html >> row 1 b
I tried variations of each loops that would increment the table row, but
I don’t have the syntax correct for it to work.
Any ideas?
THANKS!
I found a superb example on how to do the above problem -
http://tinyurl.com/4zl6b9
But now I can’t figure out how to save my output to my table. I am
following this example exactly so that I can duplicate it for my
purposes. I am able to get the output correctly, but I can’t save it.
I get an “undefined method save” error.
See below for Steve’s code.
Where can I put my method save to have it save my values to my table?
I tried to replace puts g.to_csv to g.save, but that didn’t work. I
also tried to replace games << game with game.save, but that didn’t work
either.
I’m still new at figuring out how to convert examples like this to my
needs.
Any help is greatly appreciated!!!
— code from Steve —
def parse_games(doc)
games = []
doc.search(“//table[@class=‘tablehead’]//tr”).each do |tr|
@week = tr.search(“/td/a”).inner_html if(tr[:class] == ‘stathead’)
@date = tr.at(“td”).inner_html if(tr[:class] == ‘colhead’)
teams = []
tr.at("td").search("a").each do |team|
teams << team.inner_html
end
if(teams.size == 2)
@time = tr.search("td:eq(1)").inner_html
game = Game.new()
game.date = @date
game.week = @week
game.time = @time
game.away_team = teams[0]
game.home_team = teams[1]
games << game
end
end
games
end
games =
parse_games(Hpricot(open(“NFL Schedule - 2023 Season - ESPN”)))
games.each do |g|
puts g.to_csv
end
I need to read the rows starting with the second table row, which
I tried variations of each loops that would increment the table row,
but
I don’t have the syntax correct for it to work.
I don’t know if hpricot’s result set supports slice, but you could do
something like this:
doc.search(‘table td’).slice(1,99999999).each do |td_ele|
…
end
I’m cheating by picking 999999 but it’s easier than figuring out how
many results there are. I’m sure there’s a method to simply remove
that first element in a chain, but I can’t think of it.
And you’d want to change that ‘search’ to match your table specifically.
Philip H. wrote:
I don’t know if hpricot’s result set supports slice, but you could do
something like this:
doc.search(‘table td’).slice(1,99999999).each do |td_ele|
…
end
Thanks, but I’ve actually been able to move past the point of parsing
out my variables. Now I just need to figure out how to save my results.
Any thoughts on that would be very much appreciated.
Mark T. wrote:
Who needs loops when you have XPath? You can grab an entire column in
one fell swoop.
require ‘xml’
doc = XML::Parser.string(html).parse
column1 = doc.find(‘/table/tr[position()>1]/td[1]/text()’)
puts column1.to_a
You’ll need libxml for that (gem install libxml-ruby). Hpricot is not
XPath compliant enough.
See my XPath article here:
markthomas.org
– Mark.
I looked over you XPath article. Thanks.
But given the table structure noted above, will XPath actually work to
take the variables that I need in each of the columns and insert those
into database fields? I need to locate 3 separate columns and grab each
of those by row and insert those into a table. So the html and the
database table will end up looking the same.
Thanks.
But given the table structure noted above, will XPath actually work to
take the variables that I need in each of the columns and insert those
into database fields? I need to locate 3 separate columns and grab each
of those by row and insert those into a table. So the html and the
database table will end up looking the same.
If you need a row at a time, then you just shorten your XPath a bit:
doc = XML::Parser.string(html).parse
doc.find(’/table/tr[position()>1]’).each do |row|
my_db_insert(row.find(‘td/text()’).to_a)
end
– Mark.
On Oct 7, 6:01 pm, Becca G. [email protected]
wrote:
I need to read the rows starting with the second table row, which
excludes the title of the column. Then I need to read each column’s
value.
How can I loop through each row to get each value that I need?
Who needs loops when you have XPath? You can grab an entire column in
one fell swoop.
require ‘xml’
doc = XML::Parser.string(html).parse
column1 = doc.find(‘/table/tr[position()>1]/td[1]/text()’)
puts column1.to_a
You’ll need libxml for that (gem install libxml-ruby). Hpricot is not
XPath compliant enough.
See my XPath article here:
http://markthomas.org/2008/08/22/improve-your-XML-parsing-with-XPath/
– Mark.