Open URI & web scraping. (Sorry for crosspost)

jeannibee · November 13, 2007, 3:01pm

Hi

I originally posted this to the Ruby forum but lack of responses (Aside
from one that was nice but didn’t solve my problem) I’m hoping the ‘web’
people her may know what I’m doing wrong.

I’m sorry but this is NOT rails specific… more “WEB” specific.

Thanks for understanding.

Nutshell if I use open URI (and Hpricot) to download a web page and
‘scrape’ all the images to write them to my local disk dynamic images
always have improper format (Size 0) but static images are fine.

Example would be :

Whether I copy/paste this URL in another browser or use open URI to
“get” the image I get an an error of:

XML Parsing Error: no element found
Location: http://myserver:8080/Someservlet?name=blah&param=value&etc=etc
Line Number 1, Column 1:

BUT, this image is displayed PERFECTLY in the html.

How can I get this image to download? (I suspect it’s the mime type
being set on the server side but I am not 100% sure)

OUTPUT

[[URI information…]]
Fetched document:
http://myserver:8080/Someservlet?name=blah&param=value&etc=etc
Content Type: application/voicexml+xml
Charset:
Content-Encoding:
Last Modified:
IMAGE INFO!!! →
Writing to file ::
D:\project_x\trunk\dumps\1194882652_854.gif

Thanks for your help.