Forum: Ruby Unicode string conversion

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Alexandre R. (Guest)
on 2007-05-05 04:01
I'm reading a binary file in my program. It contains strings in the
Windows Unicode format, which it says is stored as little-endian in the
spefication. I'm loading it and trying to convert using Iconv, but I'm
getting a invalid character exception, on any string. Now I'm  just
stripping the \000 character from it and it works, but I know it's not
an ideal solution and it only works in some cases.
So, how can I get the string in a format Ruby can understand? By the
way, I'll load these string in GTK (with Ruby bindings), anyone knows if
it can show Unicode strings?
John J. (Guest)
on 2007-05-05 08:51
(Received via mailing list)
On May 5, 2007, at 9:01 AM, Alexandre R. wrote:

> it can show Unicode strings?
>
> --
> Posted via http://www.ruby-forum.com/.
>
Stripping the BOM? (byte order mark)
Should be fine. Unicode works just as well w/ no BOM, actually better
with no BOM.
The first thing you should check for though is the presence of the
BOM and read the BOM.
Alexandre R. (Guest)
on 2007-05-06 18:05
John J. wrote:
> On May 5, 2007, at 9:01 AM, Alexandre R. wrote:
>
>> it can show Unicode strings?
>>
>> --
>> Posted via http://www.ruby-forum.com/.
>>
> Stripping the BOM? (byte order mark)
> Should be fine. Unicode works just as well w/ no BOM, actually better
> with no BOM.
> The first thing you should check for though is the presence of the
> BOM and read the BOM.

There is no BOM. The specifications clearly states it "uses UTF-16,
little endian, and the Byte-Order Marker (BOM) character is not present"

What I'm confused is to why Iconv coudlnt convert it. Does Iconv expects
for the BOM, even when I specify UTF16LE, which would make it explicit
the byte order?
Nobuyoshi N. (Guest)
on 2007-05-09 03:53
(Received via mailing list)
Hi,

At Sun, 6 May 2007 23:05:37 +0900,
Alexandre R. wrote in [ruby-talk:250503]:
> What I'm confused is to why Iconv coudlnt convert it. Does Iconv expects
> for the BOM, even when I specify UTF16LE, which would make it explicit
> the byte order?

BOM is a "ZERO WIDTH NON-BREAKING SPACE" at the beginning of
a text.  Almost iconv(3) should be possible to deal with it.
Can't you show minimal data to reproduce the error?
This topic is locked and can not be replied to.