Get data from html table

addis_a · October 3, 2014, 3:57am

I am new to Ruby can someone help me please?

below is my html structure, I want to get ‘X01FJ65K0M’ from td. There is
no id on this table.

Thanks in advance

Request Type	Agent Type
X01FJ65K0M	07/03/2014 08:14:42

aparna · October 3, 2014, 9:20am

First a question; do you want to retrieve data from HTML on a regular
basis or often in the same program? In this case use the Nokogiri-gem :

------------------ untested ------------
require ‘nokogiri’

File.open(‘htmlfile’, “r”) do |infile|
htmldoc = Nokogiri::HTML::Document.parse(infile)
data = htmldoc.xpath("//td").first.content
end

If you need this just once, read the file line by line to detect the
table-row of the right CSS-class, than find the very first td in this
row.
See File.readlines. The procedure may depend on the actual formating of
the HTML-code. If the code is not “beautified”, you will need something
like String.match in addition.

But I would use Nokogiri anyway.

aparna · October 14, 2014, 9:33am

Michael U. wrote in post #1158939:

First a question; do you want to retrieve data from HTML on a regular
basis or often in the same program? In this case use the Nokogiri-gem :

------------------ untested ------------
require ‘nokogiri’

File.open(‘htmlfile’, “r”) do |infile|
htmldoc = Nokogiri::HTML::Document.parse(infile)
data = htmldoc.xpath("//td").first.content
end

Thanks for this. It helped me partially. I am not using any file to
parse. Its a table in web page. I want to get td from tr.How do I
iterate ?

this is my xpath: table/tr[2]/td[4].
det_page is web page opened by nokogiri.

If I want to have this xpath in a loop how do I do ? Below does not
execute.

p = 2
apptpath = det_page.xpath("//table/tr[p]/td[4]").each do |row|

row.xpath(’./tr[p]’).each do |tr|

row.xpath(’./td[4]’).each do |td|

puts td.text

p = p+1
end
end
end

aparna · October 14, 2014, 3:37pm

Aparna Kya wrote in post #1159965:

If I want to have this xpath in a loop how do I do ? Below does not
execute.

“does not execute” is a rather general statement. Your screen should not
stay empty or black or something.

1 p = 2
2 apptpath = det_page.xpath("//table/tr[p]/td[4]").each do |row|
3
4 row.xpath(’./tr[p]’).each do |tr|
5
6 row.xpath(’./td[4]’).each do |td|
7
8 puts td.text
9
10 p = p+1
11 end
12 end
13 end

Line 2: You iterate all the cells in the fourth column of the third and
all the later rows of a table. You do not iterate rows.

Line 4: You iterate over all the rows in third and later positions. How
many tables do you expect in the HTML-FILE that you are handling?

Line 6: does not make much sense, as you iterate over each fourth cell
in the cells that you are iterating over in the outer loop.

is not valid HTML (maybe html5 to 99, but who knows what those so-called ?standards? will ever look like...)

Line 10: You can probably skip the variable p, once that your calls to
each are managed.

To achieve a better understanding of what is going on, try not to
finalize your program with mysterious code. Begin with some
experimentation. For example, try to puts() the kind of tags, that your
xpath produces.

puts det_page.xpath("//table/tr[p]/td[4]").to_s
puts det_page.xpath("//table/tr[p]/td[4]").size
puts det_page.xpath("//table/tr[p]/td[4]")[0].name
puts det_page.xpath("//table/tr[p]/td[4]")[0].node_type

and so on… see :$> ri Nokogiri::XML::Node for things to try.

Get data from html table

File.open(‘htmlfile’, “r”) do |infile| htmldoc = Nokogiri::HTML::Document.parse(infile) data = htmldoc.xpath("//td").first.content end

File.open(‘htmlfile’, “r”) do |infile| htmldoc = Nokogiri::HTML::Document.parse(infile) data = htmldoc.xpath("//td").first.content end

“does not execute” is a rather general statement. Your screen should not stay empty or black or something.

1 p = 2 2 apptpath = det_page.xpath("//table/tr[p]/td[4]").each do |row| 3 4 row.xpath(’./tr[p]’).each do |tr| 5 6 row.xpath(’./td[4]’).each do |td| 7 8 puts td.text 9 10 p = p+1 11 end 12 end 13 end

File.open(‘htmlfile’, “r”) do |infile|
htmldoc = Nokogiri::HTML::Document.parse(infile)
data = htmldoc.xpath("//td").first.content
end

File.open(‘htmlfile’, “r”) do |infile|
htmldoc = Nokogiri::HTML::Document.parse(infile)
data = htmldoc.xpath("//td").first.content
end

“does not execute” is a rather general statement. Your screen should not
stay empty or black or something.

1 p = 2
2 apptpath = det_page.xpath("//table/tr[p]/td[4]").each do |row|
3
4 row.xpath(’./tr[p]’).each do |tr|
5
6 row.xpath(’./td[4]’).each do |td|
7
8 puts td.text
9
10 p = p+1
11 end
12 end
13 end