Does somebody have any suggestions on how to extract relative as well
as absolute links from an html page. It seems like the URI.extract only
matches on absolute urls.
Any pointers or suggestions are appreciated.
THanks-
Christian
[email protected] wrote:
Does somebody have any suggestions on how to extract relative as well
as absolute links from an html page. It seems like the URI.extract only
matches on absolute urls.
Roll your own, a classic piece of programming advice:
#!/usr/bin/ruby -w
data = File.read("/path/page.html")
data.scan(/src\s*=\s*"(.*?)"/im) { |item|
puts “src = #{item}\n”
}
data.scan(/href\s*=\s*"(.*?)"/im) { |item|
puts “href = #{item}\n”
}