Hi there, I’m using net/http to retrieve some html pages and now I
want to count the number of items in a list on the page. The
response.body is stored as a string.
The HTML looks something like this:
Section Heading I'm interested in:
foo
bar
So what I want to do is count the number of li’s in a particular div
section. In this case the answer is 2. It might be more, it might be
0.
I can find the section I want with a regex but I don’t know how to
iterate through the string looking for particular elements. I was
thinking about taking the section I’m interested in and saving it as
an array and then iterating through each array element (html line)
that way, but I thought there might be a quicker way to do it.
iterate through the string looking for particular elements. I was
thinking about taking the section I'm interested in and saving it as
an array and then iterating through each array element (html line)
that way, but I thought there might be a quicker way to do it.
suggestions?
I’d use Nokogiri. Off the top of my head, it would be something like
(untested):
I would say if you aren’t exactly concerned with the content of the row.
Perhaps just counting the number of lines in the array? I guess you
would have to read in the page line by line…but that isn’t too hard.
I can find the section I want with a regex but I don’t know how to
iterate through the string looking for particular elements. I was
thinking about taking the section I’m interested in and saving it as
an array and then iterating through each array element (html line)
that way, but I thought there might be a quicker way to do it.
I’d be reserve regex parsing of xml only for very informal situations
where
I just a quick solution non rigorous solution (ie a one-time solution
that I
plan to verify personally), I am pretty sure that it is not possible to
correctly parse xml with regex.
I can find the section I want with a regex but I don’t know how to
iterate through the string looking for particular elements. I was
thinking about taking the section I’m interested in and saving it as
an array and then iterating through each array element (html line)
that way, but I thought there might be a quicker way to do it.
$html.scan(%r{<div.first section.}m).to_s.scan(/
/).size
Thanks Steel. This worked fine. I just needed to make it a lazy
search with .*?
I’ve got nothing against Nokogiri or the other solutions but I was
hoping for a solution like this that just uses the core libraries for
portability.
Cheers! Paul.
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.