pshah
1
hi,
I read html file using nokogiri. and its work fine.
But after read when i print it, it shows me unknown charater like
“ ” in place of hello
so it looks like “hello ”.
it create problem bcoz of   and ending tag.
If any know about its solution please help.
Thanks,
Priyank S.
pshah
2
Try using
p str
or
puts str.inspect
or
puts str.bytes.to_a.inspect
to get a better look at what character codes are in there.
pshah
3
Brian C. wrote:
Try using
p str
or
puts str.inspect
or
puts str.bytes.to_a.inspect
to get a better look at what character codes are in there.
Hi
Thanks for reply,
But it is not useful for me if i use inspect it convert “hello\302\240”
i want simple space.
Thanks,
Priyank S.
pshah
4
Priyank S. wrote:
But it is not useful for me if i use inspect it convert “hello\302\240”
That is useful.
It shows that the has been converted into the sequence \302\240
(octal)
or \xc2\xa0 (hex)
That happens to be the code for a non-breaking space in UTF-8, codepoint
160:
$ irb19
160.chr(“UTF-8”)
=> "Â "
160.chr(“UTF-8”).bytes.to_a
=> [194, 160]
160.chr(“UTF-8”).force_encoding(“ASCII-8BIT”)
=> “\xC2\xA0”
So the terminal you are trying to print it to is non-UTF-8. Perhaps a
Windows box? You didn’t say what your platform was.
In that case, you need to re-encode it to the appropriate character set.