Extract relative links from html page

Does somebody have any suggestions on how to extract relative as well
as absolute links from an html page. It seems like the URI.extract only
matches on absolute urls.
Any pointers or suggestions are appreciated.
THanks-
Christian

[email protected] wrote:

Does somebody have any suggestions on how to extract relative as well
as absolute links from an html page. It seems like the URI.extract only
matches on absolute urls.

Roll your own, a classic piece of programming advice:

#!/usr/bin/ruby -w

data = File.read("/path/page.html")

data.scan(/src\s*=\s*"(.*?)"/im) { |item|
puts “src = #{item}\n”
}

data.scan(/href\s*=\s*"(.*?)"/im) { |item|
puts “href = #{item}\n”
}

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs