Ruby/Java strings solved the Ruby way

Hello all!

I’m posting this here because it seem like a topic better discussed in
the general Ruby community.

JRuby allows you pretty seamless access to Java libraries through its
Java integration layer. You can pull in just about any class,
instantiate objects, call methods, and so on. In order to make this a
bit easier, in many cases we automatically coerce particular Ruby types
to their equivalent Java types. For example, Fixnum becomes either boxed
integral or primitive integral values, Floats become boxed
floating-point or primitive floating point values, and Strings are
decoded from byte[] into Java String as UTF-8.

But there’s a problem with this…it adds a bit of overhead in the
numeric cases, and a lot of overhead in the String case.

Here’s a comparison between calling a method that takes an int and a
method that takes a String (best times out of five):

with string ‘hello’: 1.068688
with fixnum 1: 0.563014

And this is a short string. The coercion cost for strings is at least
O(n).

It’s about String coercion I’m writing.

We’ll never be able to eliminate the coercion cost entirely. Ruby
Strings are byte[] and it has been a great move for us implementing our
own String and related classes to use byte[] always. So there’s never
going to be a straight-through path from a Ruby String to a Java String.
But I think we can reduce the impact for JRuby users by doing things the
Ruby way.

Ruby already has a protocol for coercion, via methods like to_str,
to_ary and so on. This allows you to pass e.g. non-Strings to methods
that act on Strings, and frequently (usually) they’ll coerce and work
fine. Often, if you want to avoid a coercion hit, you’ll create the
String ahead of time. And that’s where we can learn from Ruby for Java
String handling.

So I propose that instead of always decoding incoming Ruby String into a
Java String when calling a Java method, we introduce a new type–call it
JString for now–that represents a Java string. When you require in the
Java integration support, it would add to Ruby String a method
to_jstring (or to_String or hey, toString?). So for calls from Ruby to
Java, we’d follow Ruby coercion protocols and only accept either JString
or objects that coerce to JString.

Likewise, coming from Java to Ruby, we wouldn’t automatically coerce;
we’d return a JString object that implements to_str. You can then
usually pass that to String APIs, or just coerce it immediately and go
on with your business. Since this latter change would break some apps
that expect Java strings to always be coerced, it would be saved for the
next major release of JRuby and thoroughly discussed.

I think this model provides the best possible experience when calling
Java from Ruby but also allow JRuby users to take control of the
coercion process, either be defining their own to_jstring methods on
other types, or by pre-coercing strings they intend to use a lot.

Thoughts?

  • Charlie

On Sun, Jul 20, 2008 at 1:36 PM, Charles Oliver N.
[email protected] wrote:
[snip]

that to String APIs, or just coerce it immediately and go on with your
business. Since this latter change would break some apps that expect Java
strings to always be coerced, it would be saved for the next major release
of JRuby and thoroughly discussed.

This sounds like an excellent compromise. I vote for to_jstring
because it looks most Ruby-esque.

Jim

Jim M. wrote:

Likewise, coming from Java to Ruby, we wouldn’t automatically coerce; we’d
return a JString object that implements to_str. You can then usually pass
that to String APIs, or just coerce it immediately and go on with your
business. Since this latter change would break some apps that expect Java
strings to always be coerced, it would be saved for the next major release
of JRuby and thoroughly discussed.

This sounds like an excellent compromise. I vote for to_jstring
because it looks most Ruby-esque.

Also up for debate is whether boxed primitives from Java should behave
the same way, with a JInteger, JFloat, and so on that can coerce to
Fixnum or Float. But boxed primitives are considerably cheaper to coerce
than Strings, so it may not be worth it.

  • Charlie