Zlib gzip Iconv, what is going on with UTF-8

Hi. I googled rly hard but nothing is working.

I got ruby 1.8.6 (test has been performed on windows)

My problem is when I try receive gziped response from some sites which
is not in UTF-8 itself.

For instance this one is correct:
response = Net::HTTP.get_with_head(‘http://www.wp.pl/’,
{‘Accept-Encoding’ => ‘gzip;q=1.0, identity;’, ‘Accept-Charset’ =>
‘utf-8’})

so when I try unpack it everything works fine
if response[‘Content-Encoding’]
body_io = StringIO.new(response.body)
html = Zlib::GzipReader.new(body_io).read()
html = Iconv.conv(‘utf-8//IGNORE’, encoding, html)
else
#html = response.body
end

problem APPERAS when site which I want to receive has got other charset
than UTF-8, so changing first line to(other server):
response = Net::HTTP.get_with_head(‘http://www.interia.pl/’,
{‘Accept-Encoding’ => ‘gzip;q=1.0, identity;’, ‘Accept-Charset’ =>
‘utf-8’})

give me site without ANY PROPER UTF-8 character

when I comment out Iconv line

my output has got abnormal characters in the utf-8 character’s place
like: “?” (every character is replaced with some sort of question mark)

It is fault of server Gzip-way is content packed or my fault (I unpack
it in bad way)

Regards

I want to add, everything works fine when I turn off gzip
so following code works fine for every site:

response = Net::HTTP.get_with_head(‘http://www.interia.pl/’,
{‘Accept-Charset’ => ‘utf-8’})

that = Nokogiri::HTML(Iconv.conv(‘utf-8//IGNORE’, encoding, html))

and ‘that’ is proper formated UTF-8 characterset

When gzip is present makes it ���

regards

Piotr MÄ…sior wrote:

I want to add, everything works fine when I turn off gzip
so following code works fine for every site:

response = Net::HTTP.get_with_head(‘http://www.interia.pl/’,
{‘Accept-Charset’ => ‘utf-8’})

that = Nokogiri::HTML(Iconv.conv(‘utf-8//IGNORE’, encoding, html))

and ‘that’ is proper formated UTF-8 characterset

When gzip is present makes it ���

regards

Problem SOLVED, I had bad condition what caused problem. I always was
giving “utf-8” to Iconv as encoding

regards