Read html then output one line of html if found?

mmcolli00 · March 12, 2009, 9:38pm

Hi
I have many rspecs showing the duration of several test examples. These
rspec fils are in an HTML format. I want to be able to grab the
durations and htmls and write to a textfile. However I can not figure
out how to pull the 1 line from the html. When I use puts it shows the
whole html. Will you help me grab one line out of the html?

Here is my snippet.

require “fileutils”

Dir[“C:/Respecs/*.html”].each do |htmlfile|
readhtml = File.read(htmlfile)
if readhtml.include?(“seconds”) == true
htmlbase = File.basename(htmlfile)
puts htmlbase #<–shows full html file not just the line that
“seconds” is located on.

end

mmcolli00 · March 13, 2009, 9:20am

Le 12 mars à 21:36, Mmcolli00 Mom a écrit :

Here is my snippet.

require “fileutils”

Dir[“C:/Respecs/*.html”].each do |htmlfile|
readhtml = File.read(htmlfile)
if readhtml.include?(“seconds”) == true
htmlbase = File.basename(htmlfile)
puts htmlbase
end

For the easy way, try readlines and grep :

h = File.readlines(‘f1.txt’)
=> [“

hhhhhh
\n”, “
20 seconds
\n”, “
Blah.
\n”,
“\n”]

h.grep(/seconds/)
=> [“

20 seconds
\n”]

For a more sophisticated (and time-consuming) approach, try an HTML
parser like Hpricot :

require “hpricot”
=> true

doc = Hpricot(File.read(‘f1.txt’))
=> #<Hpricot::Doc {elem

“hhhhhh”
} “\n” {elem
“20
seconds”
} “\n” {elem
“Blah.”
} “\n\n”>

doc.children.select { |e| e.inner_html =~ /seconds/ }
=> [{elem h2 “20 seconds” h2}]

HTH.

Fred

mmcolli00 · March 13, 2009, 2:01pm

Thanks Fred - this is very helpful!