I’m trying to scrape images from a page. I’m using Hpricot to scrape the
actual image URLs into an array but I’ve encountered a problem regarding
resolving the full image paths.
Example:
The src of the images can be like any of the following:
http://external.site.com/images/image.jpg (Full URL)
/images/image.jpg (Absolute Path)
…/images/image.jpg (Relative Path)
images/image.jpg (Relative Path)
Is there a way to resolve these paths to the proper URLs? So I can copy
the images to my server or whatever else I need to do with them?
Hope that makes sense.
Cheers,
Jim