Regex problem

flnative · November 27, 2007, 5:28pm

hi @all

I would like to scan a string of html-tags. I need it to take out all
links (a-tags) in the string, but I become only the last one. What is
wrong? See the code below…

response = ‘test1 - test2’
response.scan(/<a.href="(.?)"/) do |line|
puts line
end

thanks for helping!

flnative · November 27, 2007, 5:45pm

the first kleene star might need to be non greedy? in other words stop
at the first href consumed, not the last.
/<a.?href="(.?)"/

flnative · November 27, 2007, 6:01pm

On Nov 27, 2007 11:28 AM, K. R. [email protected] wrote:

end

thanks for helping!

Posted via http://www.ruby-forum.com/.

Franco is right. You could fix it by doing “a.*?href”. However, I
would change “a.*href” to “a\s+href” since you’re looking for any
amount of whitespace after the “a” and before the “href”.

response = ‘test1 - test2’
response.scan(/<a\s+href=“(.*?)”/s) do |line|
puts line
end

flnative · December 1, 2007, 5:35pm

On Nov 27, 12:00 pm, Christian von Kleist [email protected]
wrote:

response = ‘test1 - test2’
response.scan(/<a.href="(.?)"/) do |line|
puts line
end
but what if href is not the first attribute of ?

flnative · December 2, 2007, 2:05pm

response.scan(/<a.href="(.?)"/) do |line|
but what if href is not the first attribute of ?

Regardless which order has the attributes, because you can have any
sequence (.*) between the <a tag and href.