Unicode string conversion


#1

I’m reading a binary file in my program. It contains strings in the
Windows Unicode format, which it says is stored as little-endian in the
spefication. I’m loading it and trying to convert using Iconv, but I’m
getting a invalid character exception, on any string. Now I’m just
stripping the \000 character from it and it works, but I know it’s not
an ideal solution and it only works in some cases.
So, how can I get the string in a format Ruby can understand? By the
way, I’ll load these string in GTK (with Ruby bindings), anyone knows if
it can show Unicode strings?


#2

On May 5, 2007, at 9:01 AM, Alexandre R. wrote:

it can show Unicode strings?


Posted via http://www.ruby-forum.com/.

Stripping the BOM? (byte order mark)
Should be fine. Unicode works just as well w/ no BOM, actually better
with no BOM.
The first thing you should check for though is the presence of the
BOM and read the BOM.


#3

John J. wrote:

On May 5, 2007, at 9:01 AM, Alexandre R. wrote:

it can show Unicode strings?


Posted via http://www.ruby-forum.com/.

Stripping the BOM? (byte order mark)
Should be fine. Unicode works just as well w/ no BOM, actually better
with no BOM.
The first thing you should check for though is the presence of the
BOM and read the BOM.

There is no BOM. The specifications clearly states it “uses UTF-16,
little endian, and the Byte-Order Marker (BOM) character is not present”

What I’m confused is to why Iconv coudlnt convert it. Does Iconv expects
for the BOM, even when I specify UTF16LE, which would make it explicit
the byte order?


#4

Hi,

At Sun, 6 May 2007 23:05:37 +0900,
Alexandre R. wrote in [ruby-talk:250503]:

What I’m confused is to why Iconv coudlnt convert it. Does Iconv expects
for the BOM, even when I specify UTF16LE, which would make it explicit
the byte order?

BOM is a “ZERO WIDTH NON-BREAKING SPACE” at the beginning of
a text. Almost iconv(3) should be possible to deal with it.
Can’t you show minimal data to reproduce the error?