Unknown character print on irb or command prompt

pshah · August 5, 2010, 8:55am

hi,

I read html file using nokogiri. and its work fine.

But after read when i print it, it shows me unknown charater like

“â”¬Ã¡” in place of hello

so it looks like “helloâ”¬Ã¡”.

it create problem bcoz of &nbsp and ending tag.

If any know about its solution please help.

Thanks,
Priyank S.

pshah · August 5, 2010, 10:08am

Try using
p str
or
puts str.inspect
or
puts str.bytes.to_a.inspect

to get a better look at what character codes are in there.

pshah · August 5, 2010, 10:18am

Brian C. wrote:

Try using
p str
or
puts str.inspect
or
puts str.bytes.to_a.inspect

to get a better look at what character codes are in there.

Hi

Thanks for reply,

But it is not useful for me if i use inspect it convert “hello\302\240”

i want simple space.

Thanks,
Priyank S.

pshah · August 5, 2010, 3:38pm

Priyank S. wrote:

But it is not useful for me if i use inspect it convert “hello\302\240”

That is useful.

It shows that the has been converted into the sequence \302\240
(octal)
or \xc2\xa0 (hex)

That happens to be the code for a non-breaking space in UTF-8, codepoint
160:

$ irb19

160.chr(“UTF-8”)
=> "Â "

160.chr(“UTF-8”).bytes.to_a
=> [194, 160]

160.chr(“UTF-8”).force_encoding(“ASCII-8BIT”)
=> “\xC2\xA0”

So the terminal you are trying to print it to is non-UTF-8. Perhaps a
Windows box? You didn’t say what your platform was.

In that case, you need to re-encode it to the appropriate character set.