Forum: Ruby on Rails HTML Parsing libraries

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Nara H. (Guest)
on 2006-06-05 13:25
Hi,

What is the best way to parse HTML?

Or is there a simple way to convert a table to an array?

I tried beautiful_soup and the built-in htmltools, but have trouble
getting them to run.

Any pointers?

Thanks, Hari
Scott Fortmann-Roe (Guest)
on 2006-06-05 13:36
(Received via mailing list)
My experience with Ruby's html parsing tools are that they are
generally badly documented and buggy. I have had more success piping
the html I need parsed to a Perl program that uses Perl's much, much
better html libraries and then reading back the output into my Ruby
program.

It's ugly, but it works.

Check out search.cpan.org to see what perl classes are available.

-Scott
Igor A. (Guest)
on 2006-06-05 13:39
(Received via mailing list)
I used html-tools:
http://rubyforge.org/projects/ruby-htmltools/
and found it pretty simple to use but powerfull html parser library.
Nara H. (Guest)
on 2006-06-05 13:42
Igor A. wrote:
> I used html-tools:
> http://rubyforge.org/projects/ruby-htmltools/
> and found it pretty simple to use but powerfull html parser library.

Thanks Scott for your reply. I am looking for a simple ruby solution as
the page I am trying to parse has only few tables in it.

Igor, I tried to get the html-tools to run but couldn't succeed :( I
tried to run the ebaysearch.rb demo program and couldn't run it either.

I am using Ruby 1.8 with RoR 1.1.

Where you able to use the html-tools? Can you share a simple/sample
code?

Thanks, Hari
Igor A. (Guest)
on 2006-06-05 14:13
(Received via mailing list)
_______________________________________________
Rails mailing list
removed_email_address@domain.invalid
http://lists.rubyonrails.org/mailman/listinfo/rails
Nara H. (Guest)
on 2006-06-05 17:25
Igor A. wrote:
> _______________________________________________
> Rails mailing list
> removed_email_address@domain.invalid
> http://lists.rubyonrails.org/mailman/listinfo/rails

Hi Igor,

Have you quoted/attached something? I don't see it.

Thanks, Hari
unknown (Guest)
on 2006-06-05 21:00
(Received via mailing list)
BeautifulSoup has been ported to ruby as RubyfulSoup.
    http://www.crummy.com/software/RubyfulSoup/

it really works wonders when one must screen-scrape.

cheers,
jean-pierre
This topic is locked and can not be replied to.