MutableString encoding issue

Stumbled on this when testing yaml.

“\204”
=> “?”

While

irb(main):005:0>"\204"
=> “\204”

I believe Ruby string can hold arbitrary byte values, but as we are
storing content as a string we are obviously losing all values that
cannot be represented in default encoding. Tomas, what do you think?

This was a known issue a while back, it’s the reason the Zlib library
didn’t
work well with binary files. I’m fairly certain there was work being
done
on making String be backed by a byte array… and in fact I thought this
was
already done.

On Mon, Jul 14, 2008 at 2:20 AM, Oleg T. [email protected]

MutableString can have one of three internal representations, depending
on how it was last used. One of these is a byte array. This particular
problem may be in the scanner or parser and not in the actual string
class, as we don’t otherwise have a problem storing the character:

$s = “\204”
=> “?”
$s[0]
=> 63
$s[0] = 132
=> 132
$s
=> “\204”

From: Michael L. [mailto:[email protected]]
Sent: Monday, July 14, 2008 6:21 AM
To: [email protected]
Cc: IronRuby Team
Subject: Re: [Ironruby-core] MutableString encoding issue

This was a known issue a while back, it’s the reason the Zlib library
didn’t work well with binary files. I’m fairly certain there was work
being done on making String be backed by a byte array… and in fact I
thought this was already done.
On Mon, Jul 14, 2008 at 2:20 AM, Oleg T.
<[email protected]mailto:[email protected]> wrote:

Stumbled on this when testing yaml.

“\204”

=> “?”

While

irb(main):005:0>“\204”

=> “\204”

I believe Ruby string can hold arbitrary byte values, but as we are
storing content as a string we are obviously losing all values that
cannot be represented in default encoding. Tomas, what do you think?

Oleg

This problem is probably StringContent.ToByteArray(). It uses
Encoding.GetBytes(string) which obeys .NET encoding semantics and by
default replaces any nonconvertible characters to ‘?’.
And then MutableStringOps.Dump() is using it to create string
representation.
We could make StringContent.ToByteArray() not replacing nonconvertible
characters by using EncodingFallback.
BinaryContent.ToString()/ToStringBuilder() also has the same issue.


Oleg

From: Curt H.
Sent: Monday, July 14, 2008 6:27 AM
To: Michael L.; [email protected]
Cc: IronRuby Team
Subject: RE: [Ironruby-core] MutableString encoding issue

MutableString can have one of three internal representations, depending
on how it was last used. One of these is a byte array. This particular
problem may be in the scanner or parser and not in the actual string
class, as we don’t otherwise have a problem storing the character:

$s = “\204”
=> “?”
$s[0]
=> 63
$s[0] = 132
=> 132
$s
=> “\204”

From: Michael L. [mailto:[email protected]]
Sent: Monday, July 14, 2008 6:21 AM
To: [email protected]
Cc: IronRuby Team
Subject: Re: [Ironruby-core] MutableString encoding issue

This was a known issue a while back, it’s the reason the Zlib library
didn’t work well with binary files. I’m fairly certain there was work
being done on making String be backed by a byte array… and in fact I
thought this was already done.
On Mon, Jul 14, 2008 at 2:20 AM, Oleg T.
<[email protected]mailto:[email protected]> wrote:

Stumbled on this when testing yaml.

“\204”

=> “?”

While

irb(main):005:0>“\204”

=> “\204”

I believe Ruby string can hold arbitrary byte values, but as we are
storing content as a string we are obviously losing all values that
cannot be represented in default encoding. Tomas, what do you think?

Oleg

I think it’s a bug :). Could you file it? If it’s something that’s
blocking you I can look at it asap.

Tomas

From: Oleg T.
Sent: Sunday, July 13, 2008 11:20 PM
To: IronRuby Team
Cc: [email protected]
Subject: MutableString encoding issue

Stumbled on this when testing yaml.

“\204”
=> “?”

While

irb(main):005:0>"\204"
=> “\204”

I believe Ruby string can hold arbitrary byte values, but as we are
storing content as a string we are obviously losing all values that
cannot be represented in default encoding. Tomas, what do you think?

Sure. Unfortunately this one blocks our yaml impl passing MRI’s
test_yaml.rb.

Oleg

From: Tomas M.
Sent: Monday, July 14, 2008 9:14 AM
To: Oleg T.; IronRuby Team
Cc: [email protected]
Subject: RE: MutableString encoding issue

I think it’s a bug :). Could you file it? If it’s something that’s
blocking you I can look at it asap.

Tomas

From: Oleg T.
Sent: Sunday, July 13, 2008 11:20 PM
To: IronRuby Team
Cc: [email protected]
Subject: MutableString encoding issue

Stumbled on this when testing yaml.

“\204”
=> “?”

While

irb(main):005:0>"\204"
=> “\204”

I believe Ruby string can hold arbitrary byte values, but as we are
storing content as a string we are obviously losing all values that
cannot be represented in default encoding. Tomas, what do you think?