Forum: Ruby read html then output one line of html if found?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Mmcolli00 M. (Guest)
on 2009-03-12 22:38
Hi
I have many rspecs showing the duration of several test examples. These
rspec fils are in an HTML format. I want to be able to grab the
durations and htmls and write to a textfile. However I can not figure
out how to pull the 1 line from the html. When I use puts it shows the
whole html. Will you help me grab one line out of the html?

Here is my snippet.

require "fileutils"

Dir["C:/Respecs/*.html"].each do |htmlfile|
    readhtml = File.read(htmlfile)
    if readhtml.include?("seconds") == true
     htmlbase = File.basename(htmlfile)
     puts htmlbase #<--shows full html file not just the line that
"seconds" is located on.

end
F. Senault (Guest)
on 2009-03-13 10:20
(Received via mailing list)
Le 12 mars à 21:36, Mmcolli00 Mom a écrit :

> Here is my snippet.
>
> require "fileutils"
>
> Dir["C:/Respecs/*.html"].each do |htmlfile|
>     readhtml = File.read(htmlfile)
>     if readhtml.include?("seconds") == true
>      htmlbase = File.basename(htmlfile)
>      puts htmlbase
> end

For the easy way, try readlines and grep :

>> h = File.readlines('f1.txt')
=> ["<h1>hhhhhh</h1>\n", "<h2>20 seconds</h2>\n", "<p>Blah.</p>\n",
"\n"]
>> h.grep(/seconds/)
=> ["<h2>20 seconds</h2>\n"]

For a more sophisticated (and time-consuming) approach, try an HTML
parser like Hpricot :

>> require "hpricot"
=> true
>> doc = Hpricot(File.read('f1.txt'))
=> #<Hpricot::Doc {elem <h1> "hhhhhh" </h1>} "\n" {elem <h2> "20
seconds" </h2>} "\n" {elem <p> "Blah." </p>} "\n\n">
>> doc.children.select { |e| e.inner_html =~ /seconds/ }
=> [{elem h2 "20 seconds" h2}]

HTH.

Fred
Mmcolli00 M. (Guest)
on 2009-03-13 15:01
Thanks Fred - this is very helpful!
This topic is locked and can not be replied to.