Parsing HTML tables

ertai · March 31, 2009, 8:59pm

Hi everybody,

I’m searching for a way to write a beautidull code which parse an HTML
table.

In fact, the table is dynamic.
It always have three columns but have randoms lines.

In each “line” (

) I want to extract the information inside the
colums . And then, I create a new object with these
informations.

I done it by splitting my html source with the method split("

") and
use regexp to extract what I want. But this solution do not satisfied
me. It’s unmaintanable.

However, I’m pretty sure that I could do more clever code…

Is there anyone has an idea, a clue a thought ?

Thanks.
PS: English is not my native langage…

ertai · March 31, 2009, 9:47pm

On Mar 31, 2:59 pm, Nicolas P. [email protected] wrote:

informations.

I done it by splitting my html source with the method split(“”) and
use regexp to extract what I want. But this solution do not satisfied
me. It’s unmaintanable.

However, I’m pretty sure that I could do more clever code…

Is there anyone has an idea, a clue a thought ?

Use a real parser. Example:

#—
require ‘nokogiri’

html = <<eohtml

One

Two

Three

eohtml

doc = Nokogiri::HTML(html)

doc.search(‘//tr’).each do |line|
puts line.search(‘td/text()’)
end

#—
Output:
One
Two
Three

ertai · March 31, 2009, 10:35pm

Use a real parser.

Hi,

Thanks for your help.
I perfomerd tests with Hpricot (already included in my Ruby release))
I obtain good results. Great tool !

Thnks for your help !