Forum: Ruby html to plain text

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
59d2de0dc2028141540521eb2360c40a?d=identicon&s=25 Colin Summers (Guest)
on 2007-06-24 21:41
(Received via mailing list)
Okay, I have played with Hpricot and I am a convert. Amazing stuff.

I am struggling up to speed and I can't find what must be a basic
function. I've scraped the FAA site and they store all their stuff
wrapped in td's, wrapped in tr's, wrapped in tables. Thank you
Hpricot.

Now that I have "<b>Manufacturer</b>" isn't there a simple call to get
rid of the last bit of html?

Thanks,
--Colin
Ee469623eb1b8e6e35d192822b9c4aa2?d=identicon&s=25 Florian Aßmann (Guest)
on 2007-06-24 22:03
(Received via mailing list)
Hi Colin, consult api doc for Hpricot.inner_text:

require 'rubygems'
require 'hpricot'
require 'open-uri'
doc = open( 'http://www.google.com/ncr' ) { |io| Hpricot io }
doc.inner_text

Regards
Florian
017e05d1a49ffa59ea03e149e7af720b?d=identicon&s=25 Chris Shea (chrisshea)
on 2007-06-24 22:06
(Received via mailing list)
On Jun 24, 1:40 pm, "Colin Summers" <blade...@gmail.com> wrote:
> Thanks,
> --Colin

It looks like you're looking for the inner_text method.

HTH,
Chris
289cf19aa581c445915c072bf45c5e25?d=identicon&s=25 Todd Benson (Guest)
on 2007-06-24 22:17
(Received via mailing list)
On 6/24/07, Florian Aßmann <florian.assmann@email.de> wrote:
> Hi Colin, consult api doc for Hpricot.inner_text:
>
> require 'rubygems'
> require 'hpricot'
> require 'open-uri'
> doc = open( 'http://www.google.com/ncr' ) { |io| Hpricot io }
> doc.inner_text
^^^^^^^
This code (above) doesn't work on my system.

The following does:

require 'rubygems'
require 'hpricot'
html_string = '<b>Manufacturer</b>'
html_data = Hpricot html_string
html_element = html_data / "b"
puts html_element.inner_html

Todd
289cf19aa581c445915c072bf45c5e25?d=identicon&s=25 Todd Benson (Guest)
on 2007-06-24 22:32
(Received via mailing list)
On 6/24/07, Todd Benson <caduceass@gmail.com> wrote:
> The following does:
>
> require 'rubygems'
> require 'hpricot'
> html_string = '<b>Manufacturer</b>'
> html_data = Hpricot html_string
> html_element = html_data / "b"
> puts html_element.inner_html

Another "jump too soon moment".

In the above code, I didn't point out that html_element should be
plural.  It still works though, but technically the grammatically
correct way would be:

require 'rubygems'
require 'hpricot'
html_string = '<b>Manufacturer</b>'
html_data = Hpricot html_string
html_elements = html_data / "b"
first_b_element = html_data.at "b"
first_b_element_also = (html_data / "b").first
puts first_b_element.inner_html

Todd
This topic is locked and can not be replied to.