I'm trying to scrape a page that both HTTP-header and the HMTL document claim is UTF-8, but all special characters are substituted by a question mark when I use Mechanize/Hpricot to scrape some accented strings and save to a local file. I suspect the page is in "ISO-8859-1", but I'm not sure. I have tried using the"ruby -Ku" and also the $KCODE='u' option without success. How can I force Mechanize to read the doc as "ISO-8859-1"? I understand that Iconv can convert encoding, but just can't see how I can use it with Mechanize... Thanks, Marius
on 2008-11-22 23:33
on 2008-12-02 16:03
I have had exactly the same problem and the same question. It seems I solve it with $KCODE ='UTF8'.