Comparing CLR strings and Ruby strings - a slightly surprising behaviour

Hi,

while writing specs for Magic, I noticed that:

instance_from(MenuItem, “Hello”).text.to_s.should == “Hello”

to_s is required to get the assertion to pass. The following will return
false:

button = Button.new
button.text = “Hello”
puts button.text == “Hello”

So I guess that CLR strings cannot be compared to Ruby strings unless
to_s
is applied.

While not a big deal, it’s a bit surprising.

Is it something that is likely to change ?

cheers,

– Thibaut

On Tue, Mar 3, 2009 at 9:46 PM, Thibaut Barrère
[email protected]wrote:

button.text = “Hello”
puts button.text == “Hello”

So I guess that CLR strings cannot be compared to Ruby strings unless to_s
is applied.

I have stumbled on this too and was very surprised. This problem does
not
exist in IronPython (does it? please correct me if I am wrong). The
behavior
I was expecting is that “on the ruby” side all strings are (or at least
behave exactly like) ruby strings, even if they come from .NET and are
automatically converted to clr-strings when passed into clr methods.

cheers,
– henon

On Tue, Mar 3, 2009 at 11:11 PM, Meinrad R.
[email protected]wrote:

false:
exist in IronPython (does it? please correct me if I am wrong). The behavior
I was expecting is that “on the ruby” side all strings are (or at least
behave exactly like) ruby strings, even if they come from .NET and are
automatically converted to clr-strings when passed into clr methods.

In addition to that, it would also be great to have automatic type
conversion on the .NET side too. I’d expect to be able to assign a
string
pulled out from the interpreter to a C# string without the need to call
ToString(). For example,

string s = engine.Execute(“‘hi’”) as string;

Currently s would be null because the dynamic cast to System.String
fails.

IronPython doesn’t have this problem because they use CLR Strings
directly; Python strings map well to CLR Strings. Ruby has mutable
strings, and CLR Strings are immutable, so IronRuby strings are
different types than CLR Strings.

Comparison of CLR Strings and Ruby Strings should be possible without
having to to_s a CLR string … so this is a bug and on our list of .NET
interop things to do. We won’t do an auto-conversion of CLR strings to
Ruby strings, since a CLR method may return a string that you want to
pass onto another CLR method. However, we do convert Ruby strings to CLR
strings today.

~js

From: [email protected]
[mailto:[email protected]] On Behalf Of Meinrad
Recheis
Sent: Tuesday, March 03, 2009 2:12 PM
To: [email protected]
Subject: Re: [Ironruby-core] Comparing CLR strings and Ruby strings - a
slightly surprising behaviour

On Tue, Mar 3, 2009 at 9:46 PM, Thibaut Barrère
<[email protected]mailto:[email protected]> wrote:
Hi,

while writing specs for Magic, I noticed that:
instance_from(MenuItem, “Hello”).text.to_s.should == “Hello”

to_s is required to get the assertion to pass. The following will return
false:

button = Button.new
button.text = “Hello”
puts button.text == “Hello”

So I guess that CLR strings cannot be compared to Ruby strings unless
to_s is applied.

I have stumbled on this too and was very surprised. This problem does
not exist in IronPython (does it? please correct me if I am wrong). The
behavior I was expecting is that “on the ruby” side all strings are (or
at least behave exactly like) ruby strings, even if they come from .NET
and are automatically converted to clr-strings when passed into clr
methods.

cheers,
– henon

This mismatch doesn’t exist in Python because Python’s string semantics
are largely compatible with .NET’s string semantics. As a result,
Python can actually use .NET’s strings as Python strings. Ruby strings,
unfortunately, are mutable, which means that IronRuby has to use a
different type to store a mutable string. In many cases, the binder
should automatically perform the conversion between a CLR string and a
mutable string. There are probably places (like this one) where
something just hasn’t been implemented yet. And there may be places
where we simply can’t do an automatic conversion.

The mutable string type in Ruby is (in my opinion) one of the most
unfortunate design decisions made in the language.

From: [email protected]
[mailto:[email protected]] On Behalf Of Meinrad
Recheis
Sent: Tuesday, March 03, 2009 2:19 PM
To: [email protected]
Subject: Re: [Ironruby-core] Comparing CLR strings and Ruby strings - a
slightly surprising behaviour

On Tue, Mar 3, 2009 at 11:11 PM, Meinrad R.
<[email protected]mailto:[email protected]> wrote:
On Tue, Mar 3, 2009 at 9:46 PM, Thibaut Barrère
<[email protected]mailto:[email protected]> wrote:
Hi,

while writing specs for Magic, I noticed that:
instance_from(MenuItem, “Hello”).text.to_s.should == “Hello”

to_s is required to get the assertion to pass. The following will return
false:

button = Button.new
button.text = “Hello”
puts button.text == “Hello”

So I guess that CLR strings cannot be compared to Ruby strings unless
to_s is applied.

I have stumbled on this too and was very surprised. This problem does
not exist in IronPython (does it? please correct me if I am wrong). The
behavior I was expecting is that “on the ruby” side all strings are (or
at least behave exactly like) ruby strings, even if they come from .NET
and are automatically converted to clr-strings when passed into clr
methods.

In addition to that, it would also be great to have automatic type
conversion on the .NET side too. I’d expect to be able to assign a
string pulled out from the interpreter to a C# string without the need
to call ToString(). For example,

string s = engine.Execute(“‘hi’”) as string;

Currently s would be null because the dynamic cast to System.String
fails.

Comparison of CLR Strings and Ruby Strings should be possible without having
to to_s a CLR string … so this is a bug and on our list of .NET interop
things to do. We won’t do an auto-conversion of CLR strings to Ruby strings,
since a CLR method may return a string that you want to pass onto another
CLR method. However, we do convert Ruby strings to CLR strings today.

Just removing the need for to_s when comparing sounds good to me.

thanks for the feedback,

– Thibaut

We don’t plan to make the highlighted code work. However we are going to
make this work:

string s = engine.Execute(“‘hi’”)

and also this works:

string s = engine.Execute(“‘hi’”).ToString(),

although in this case you need to take care of null.

The difference is that unlike Execute, Execute invokes an explicit
dynamic conversion on the resulting type. You can achieve the same using
engine.ObjectOperations.ConvertTo(engine.Execute(“‘hi’”)).

Tomas

From: [email protected]
[mailto:[email protected]] On Behalf Of Meinrad
Recheis
Sent: Tuesday, March 03, 2009 2:19 PM
To: [email protected]
Subject: Re: [Ironruby-core] Comparing CLR strings and Ruby strings - a
slightly surprising behaviour

On Tue, Mar 3, 2009 at 11:11 PM, Meinrad R.
<[email protected]mailto:[email protected]> wrote:
On Tue, Mar 3, 2009 at 9:46 PM, Thibaut Barrère
<[email protected]mailto:[email protected]> wrote:
Hi,

while writing specs for Magic, I noticed that:
instance_from(MenuItem, “Hello”).text.to_s.should == “Hello”

to_s is required to get the assertion to pass. The following will return
false:

button = Button.new
button.text = “Hello”
puts button.text == “Hello”

So I guess that CLR strings cannot be compared to Ruby strings unless
to_s is applied.

I have stumbled on this too and was very surprised. This problem does
not exist in IronPython (does it? please correct me if I am wrong). The
behavior I was expecting is that “on the ruby” side all strings are (or
at least behave exactly like) ruby strings, even if they come from .NET
and are automatically converted to clr-strings when passed into clr
methods.

In addition to that, it would also be great to have automatic type
conversion on the .NET side too. I’d expect to be able to assign a
string pulled out from the interpreter to a C# string without the need
to call ToString(). For example,

string s = engine.Execute(“‘hi’”) as string;

Currently s would be null because the dynamic cast to System.String
fails.

Ok, the Execute is convenient enough. In my opinion the highlighted
C#
code represents a potential interoperability pitfall and will need to be
documented well. On the ruby side, I think there are no technical
limitations to achieve ruby’s string behavior even if the underlying
object
is an immutable clr string. What do you think?
– henon

On Wed, Mar 4, 2009 at 12:24 AM, Tomas M. <
[email protected]> wrote:

The difference is that unlike Execute, Execute invokes an explicit
dynamic conversion on the resulting type. You can achieve the same using
engine.ObjectOperations.ConvertTo(engine.Execute(“’hi’”)).

Tomas

[…]

(This is my understanding of the problem, so Tomas please correct me if
it’s wrong … I wasn’t involved in the initial decision).

The “mutableness” of a string is the defining distinction, so we don’t
make CLR string act like a Ruby string in cases, because it won’t allow
mutation. This is the same reason we can’t easily allow Ruby methods to
operate on CLR strings, because mutating methods won’t work (chomp!,
etc). So rather than only supporting part of the Ruby methods on CLR
strings, you have to explicitly ask for one or the other.

In practice, I haven’t felt much pain by this when hosting IronRuby. The
hosting layer can make sure to convert any CLR string into a mutable
string, and any strings pulled out of IronRuby code can be turned into
CLR strings by the way Tomas showed below.

~js

From: [email protected]
[mailto:[email protected]] On Behalf Of Meinrad
Recheis
Sent: Tuesday, March 03, 2009 4:25 PM
To: [email protected]
Subject: Re: [Ironruby-core] Comparing CLR strings and Ruby strings - a
slightly surprising behaviour

Ok, the Execute is convenient enough. In my opinion the highlighted
C# code represents a potential interoperability pitfall and will need to
be documented well.
On the ruby side, I think there are no technical limitations to achieve
ruby’s string behavior even if the underlying object is an immutable clr
string. What do you think?
– henon
On Wed, Mar 4, 2009 at 12:24 AM, Tomas M.
<[email protected]mailto:[email protected]>
wrote:

We don’t plan to make the highlighted code work. However we are going to
make this work:

string s = engine.Execute(“‘hi’”)

and also this works:

string s = engine.Execute(“‘hi’”).ToString(),

although in this case you need to take care of null.

The difference is that unlike Execute, Execute invokes an explicit
dynamic conversion on the resulting type. You can achieve the same using
engine.ObjectOperations.ConvertTo(engine.Execute(“‘hi’”)).

Tomas

[…]

In addition to that, it would also be great to have automatic type
conversion on the .NET side too. I’d expect to be able to assign a
string pulled out from the interpreter to a C# string without the need
to call ToString(). For example,

string s = engine.Execute(“‘hi’”) as string;

Currently s would be null because the dynamic cast to System.String
fails.

Apart from being mutable they also carry an encoding along. Ruby string
is basically a resizable byte array with an encoding.

CLR strings are actually closer to Ruby symbols than to Ruby strings. So
we have two options on the Ruby side: either to implement the same set
of methods Symbol has or a subset of String methods that don’t mutate
the string. Since Ruby doesn’t provide many useful methods on Symbol it
might be better to choose the latter.

Tomas

From: [email protected]
[mailto:[email protected]] On Behalf Of Jimmy
Schementi
Sent: Tuesday, March 03, 2009 10:09 PM
To: [email protected]
Subject: Re: [Ironruby-core] Comparing CLR strings and Ruby strings - a
slightly surprising behaviour

(This is my understanding of the problem, so Tomas please correct me if
it’s wrong … I wasn’t involved in the initial decision).

The “mutableness” of a string is the defining distinction, so we don’t
make CLR string act like a Ruby string in cases, because it won’t allow
mutation. This is the same reason we can’t easily allow Ruby methods to
operate on CLR strings, because mutating methods won’t work (chomp!,
etc). So rather than only supporting part of the Ruby methods on CLR
strings, you have to explicitly ask for one or the other.

In practice, I haven’t felt much pain by this when hosting IronRuby. The
hosting layer can make sure to convert any CLR string into a mutable
string, and any strings pulled out of IronRuby code can be turned into
CLR strings by the way Tomas showed below.

~js

From: [email protected]
[mailto:[email protected]] On Behalf Of Meinrad
Recheis
Sent: Tuesday, March 03, 2009 4:25 PM
To: [email protected]
Subject: Re: [Ironruby-core] Comparing CLR strings and Ruby strings - a
slightly surprising behaviour

Ok, the Execute is convenient enough. In my opinion the highlighted
C# code represents a potential interoperability pitfall and will need to
be documented well.
On the ruby side, I think there are no technical limitations to achieve
ruby’s string behavior even if the underlying object is an immutable clr
string. What do you think?
– henon
On Wed, Mar 4, 2009 at 12:24 AM, Tomas M.
<[email protected]mailto:[email protected]>
wrote:

We don’t plan to make the highlighted code work. However we are going to
make this work:

string s = engine.Execute(“‘hi’”)

and also this works:

string s = engine.Execute(“‘hi’”).ToString(),

although in this case you need to take care of null.

The difference is that unlike Execute, Execute invokes an explicit
dynamic conversion on the resulting type. You can achieve the same using
engine.ObjectOperations.ConvertTo(engine.Execute(“‘hi’”)).

Tomas

[…]

In addition to that, it would also be great to have automatic type
conversion on the .NET side too. I’d expect to be able to assign a
string pulled out from the interpreter to a C# string without the need
to call ToString(). For example,

string s = engine.Execute(“‘hi’”) as string;

Currently s would be null because the dynamic cast to System.String
fails.

Hi guys,

thanks for the discussion. I really understand how tricky this can be
from an infrastructure point of view. I raised this point because I
pretty much believe the question is going to come over again and again
from newcomers. My feeling (although I understand what’s happening
under the cover) is that it breaks the POLS a bit, and that people
using IronRuby for the first time could find it somewhat “flaky”
(aaah, it’s an interop thingy, ok).

Just my opinion - if it stays the same, a red bold entry in the FAQs
will be useful :slight_smile:

In the code itself, I don’t believe this kind of comparison will be so
common, and a .to_s will not necessarily hurt. For people that write
tests/specs though, it clutters the code quite a bit.

So when running tests, I think my solution will be to patch
System::String to avoid cluttering the specs with .to_s calls.

I’ll share it here if I come to something that works for me.

cheers and thanks for the discussion, appreciated,

– Thibaut

On Wed, Mar 4, 2009 at 9:03 AM, Tomas M.

On Wed, Mar 4, 2009 at 9:03 AM, Tomas M.
<[email protected]

wrote:

Tomas

I was thinking the same direction ;). I know, the current design has
already
been decided and it’d be painful to change it. But I am curious if you
thought about the option to actually use clr strings and copy them
whenever
any muting methods have been called. Would solve the compatibility
problem
but I guess this might have too bad performance impact on existing
library
code?

The problem exists only where strings cross the language boundary (from
Ruby
to C# or vice versa). There would be the option to strictly convert
strings
into the expected format each time they cross the boundary. I guess this
might also be too costly.

Finally a thought about the consequences of the current string
incompatibility: I can imagine situations where either clr_strings that
“leak” from .Net into ruby library code or ruby strings that “leak” into
.NET code might cause bugs which are hard to track down. I was once
fooled
by the debugger … thought I got a System.String but it actually was a
Mutable string because they exactly look alike. It took some time until
I
found out what was going on.

– henon

We can differentiate #inspect on CLR string to print something like
i"xxx" (“i” for immutable) or clr:“xxx” …
We can also implement some default conversions, like to_s where it makes
sense.

We can’t use CLR strings and make copies since that don’t preserve
object identity and type (note that we need to also switch between
Unicode char[] and byte[]). We also need to attach some information with
the string (encoding, frozen and tainted flags), so it couldn’t be just
System.String anyways.

Tomas

From: [email protected]
[mailto:[email protected]] On Behalf Of Meinrad
Recheis
Sent: Wednesday, March 04, 2009 11:19 AM
To: [email protected]
Subject: Re: [Ironruby-core] Comparing CLR strings and Ruby strings - a
slightly surprising behaviour

On Wed, Mar 4, 2009 at 9:03 AM, Tomas M.
<[email protected]mailto:[email protected]>
wrote:

Apart from being mutable they also carry an encoding along. Ruby string
is basically a resizable byte array with an encoding.

CLR strings are actually closer to Ruby symbols than to Ruby strings. So
we have two options on the Ruby side: either to implement the same set
of methods Symbol has or a subset of String methods that don’t mutate
the string. Since Ruby doesn’t provide many useful methods on Symbol it
might be better to choose the latter.

Tomas
I was thinking the same direction ;). I know, the current design has
already been decided and it’d be painful to change it. But I am curious
if you thought about the option to actually use clr strings and copy
them whenever any muting methods have been called. Would solve the
compatibility problem but I guess this might have too bad performance
impact on existing library code?

The problem exists only where strings cross the language boundary (from
Ruby to C# or vice versa). There would be the option to strictly convert
strings into the expected format each time they cross the boundary. I
guess this might also be too costly.

Finally a thought about the consequences of the current string
incompatibility: I can imagine situations where either clr_strings that
“leak” from .Net into ruby library code or ruby strings that “leak” into
.NET code might cause bugs which are hard to track down. I was once
fooled by the debugger … thought I got a System.String but it actually
was a Mutable string because they exactly look alike. It took some time
until I found out what was going on.

– henon

I think System.String should support the non-mutating methods of Ruby
String class (second option Tomas mentioned). We enhance other CLR types
with the Ruby equivalent functionality (like
System.Collections.IEnumerable to Enumerable, all CLR types support Ruby
Object and Kernel methods like #class, etc).

http://www.ironruby.net/About/Roadmap already mentions supporting to_s
on all CLR types anyway. Why would we need to do anything special for
System.String?

Mutating operations will fail which could be surprising. We can address
this by throwing good error messages indicating that the user has to
explicitly convert it to a Ruby string.

I like the idea of showing System.String as clr:“xxx”.

Thanks,
Shri

From: [email protected]
[mailto:[email protected]] On Behalf Of Tomas M.
Sent: Wednesday, March 04, 2009 11:54 AM
To: [email protected]
Subject: Re: [Ironruby-core] Comparing CLR strings and Ruby strings - a
slightly surprising behaviour

We can differentiate #inspect on CLR string to print something like
i"xxx" (“i” for immutable) or clr:“xxx” …
We can also implement some default conversions, like to_s where it makes
sense.

We can’t use CLR strings and make copies since that don’t preserve
object identity and type (note that we need to also switch between
Unicode char[] and byte[]). We also need to attach some information with
the string (encoding, frozen and tainted flags), so it couldn’t be just
System.String anyways.

Tomas

From: [email protected]
[mailto:[email protected]] On Behalf Of Meinrad
Recheis
Sent: Wednesday, March 04, 2009 11:19 AM
To: [email protected]
Subject: Re: [Ironruby-core] Comparing CLR strings and Ruby strings - a
slightly surprising behaviour

On Wed, Mar 4, 2009 at 9:03 AM, Tomas M.
<[email protected]mailto:[email protected]>
wrote:

Apart from being mutable they also carry an encoding along. Ruby string
is basically a resizable byte array with an encoding.

CLR strings are actually closer to Ruby symbols than to Ruby strings. So
we have two options on the Ruby side: either to implement the same set
of methods Symbol has or a subset of String methods that don’t mutate
the string. Since Ruby doesn’t provide many useful methods on Symbol it
might be better to choose the latter.

Tomas
I was thinking the same direction ;). I know, the current design has
already been decided and it’d be painful to change it. But I am curious
if you thought about the option to actually use clr strings and copy
them whenever any muting methods have been called. Would solve the
compatibility problem but I guess this might have too bad performance
impact on existing library code?

The problem exists only where strings cross the language boundary (from
Ruby to C# or vice versa). There would be the option to strictly convert
strings into the expected format each time they cross the boundary. I
guess this might also be too costly.

Finally a thought about the consequences of the current string
incompatibility: I can imagine situations where either clr_strings that
“leak” from .Net into ruby library code or ruby strings that “leak” into
.NET code might cause bugs which are hard to track down. I was once
fooled by the debugger … thought I got a System.String but it actually
was a Mutable string because they exactly look alike. It took some time
until I found out what was going on.

– henon

I like the idea of showing System.String as clr:”xxx”.

+1 for something like that - this way the failed assertions or people
using stdout for debug would not be fooled.

– Thibaut

Agreed.

JD

From what I get from this conversation, the ruby strings are strings
with a few extra features (mutability, carrying encoding). Wouldn’t it
be logical for the ruby muteable strings to extend System.String? Are
there methods that only work on .net strings and not on ruby strings?

Forgive me if this is a stupid question, I’m not very well educated in
how muteable strings work :slight_smile:


Tinco

I’m going to use single quotes for formatting CLR strings via inspect.
“clr:” prefix is too long and it gets in your way when working mostly
with CLR strings.

“Some string”
=> “Some string”

“Some string”.to_clr_string
=> ‘Some string’

Sounds good?

Tomas

How about back ticks? Some string?

Since ruby can have single quote string literals it might not be that
obvious that ‘Some string’ it is not a normal Ruby string.

Pete

On Thu, Mar 5, 2009 at 10:03 PM, Tomas M. <
[email protected]> wrote:

I’m going to use single quotes for formatting CLR strings via inspect.
“clr:” prefix is too long and it gets in your way when working mostly with
CLR strings.

“Some string”
=> “Some string”

“Some string”.to_clr_string
=> ‘Some string’

Sounds good?

The obvious advantage is that this representation is a valid ruby
expression.
– henon