How to parse HTML doc in Ruby?


#1

Hi,

I want to parse the html doc using ruby.
I tried using reXML but failed to load html doc as it is not in well
formed structure.
Can you please suggest me a good parser which I can use to parse HTML
page using Ruby?

Thanks,
Karika.


#2

Karika wrote:

Hi,

I want to parse the html doc using ruby.
I tried using reXML but failed to load html doc as it is not in well
formed structure.
Can you please suggest me a good parser which I can use to parse HTML
page using Ruby?

Thanks,
Karika.

I’ve had good luck with Rubyful Soup:

http://www.crummy.com/software/RubyfulSoup/


#3

http://rubyforge.org/projects/tidy/