Unknown character print on irb or command prompt

hi,

I read html file using nokogiri. and its work fine.

But after read when i print it, it shows me unknown charater like

“ ” in place of hello 

so it looks like “hello ”.

it create problem bcoz of &nbsp and ending tag.

If any know about its solution please help.

Thanks,
Priyank S.

Try using
p str
or
puts str.inspect
or
puts str.bytes.to_a.inspect

to get a better look at what character codes are in there.

Brian C. wrote:

Try using
p str
or
puts str.inspect
or
puts str.bytes.to_a.inspect

to get a better look at what character codes are in there.

Hi

Thanks for reply,

But it is not useful for me if i use inspect it convert “hello\302\240”

i want simple space.

Thanks,
Priyank S.

Priyank S. wrote:

But it is not useful for me if i use inspect it convert “hello\302\240”

That is useful.

It shows that the   has been converted into the sequence \302\240
(octal)
or \xc2\xa0 (hex)

That happens to be the code for a non-breaking space in UTF-8, codepoint
160:

$ irb19

160.chr(“UTF-8”)
=> "Â "

160.chr(“UTF-8”).bytes.to_a
=> [194, 160]

160.chr(“UTF-8”).force_encoding(“ASCII-8BIT”)
=> “\xC2\xA0”

So the terminal you are trying to print it to is non-UTF-8. Perhaps a
Windows box? You didn’t say what your platform was.

In that case, you need to re-encode it to the appropriate character set.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs