Hpricot and Regular expression

Hi All,
I have a html fragment like the following

Toon Zone News1
Toon Zone News2
Toon Zone News3
Toon Zone News3

I want to match only the first three anchor tags, I dont want to get the
last one since the href f parameter is abcd.it is not an integer. I want
to get only if the request parameter is integer.i.e. the first three
anchor tags.

I have following code

doc = Hpricot(open(“http://0.0.0.0:3000/dh/list”))
fun =
doc.search(“//a[@href=‘forumdisplay.php?f=131’]//strong”).inner_html
puts fun

but it will fetch the first anchor tag content only.so I think I need to
use some regular expression to match 131, 132, 133 (f parameter) values.
I dont know how to do.
any help would be appreciated.
thanks,
dhanasekaran


Rien ne peut jamais marcher si l’on songe à tout ce qu’il faut pour que
çamarche.
– Daniel Pennac

here is a way (I uess not optimal, I’m really new here)

#!/usr/bin/ruby
require ‘hpricot’

html = <<EOS

Toon Zone News1 Toon Zone News2 Toon Zone News3 Toon Zone News3 EOS

doc = Hpricot(html)
result=[]
#get each node
doc.search(“//a”).each do |chaque|
#keep only those who have the correct value for attribute as set by the
regexp
result << chaque.at(“//strong”).inner_html if
chaque.get_attribute(“href”) =~
/f=13?/
end

Hope this helps
Sylvain

Selon Dhanasekaran V. [email protected]:

Dhanasekaran V. wrote:

to get only if the request parameter is integer.i.e. the first three
anchor tags.

Try

result = (Hpricot(html)/“a[@href]”).map.reject { |elem|
elem.attributes[‘href’] !~ /=\d+$/ }

HTH,
Peter
__
http://www.rubyrailways.com

On 12/19/06, Sylvain T. [email protected] wrote:
[snip]

#keep only those who have the correct value for attribute as set by the regexp
result << chaque.at(“//strong”).inner_html if chaque.get_attribute(“href”) =~
/f=13?/
end
[snip]

for matching f=integer… then use this regexp

/\bf=\d+\b/