Forum: JRuby jsr223 - utf8 problem

B2625ae7fa60dce7697771ca1ef57444?d=identicon&s=25 Paweł Wielgus (Guest)
on 2008-10-02 16:18
(Received via mailing list)
Hi,
i have a strange problem with utf-8 characters.
The code i have been using:

----------------------------------------------------------------------------------
ScriptEngineManager m = new ScriptEngineManager();
ScriptEngine rubyEngine = m.getEngineByName("jruby");
rubyEngine.setContext(new SimpleScriptContext());
rubyEngine.getContext().setAttribute("root", root,
ScriptContext.ENGINE_SCOPE);
String script = "¡¯Æ¶¿æ¼";
return rubyEngine.eval(script);
----------------------------------------------------------------------------------

on most of our computers returns string:
¡¯Æ¶¿æ¼
but on two machines it returns:
???????

But if i change the last line in to:
----------------------------------------------------------------------------------
String script = "¡¯Æ¶¿æ¼";
rubyEngine.put("s", script);
return rubyEngine.eval("eval $s");
----------------------------------------------------------------------------------
it works well on all of our computers.

Does anybody has any idea why? And whats wrong with the first one?

Tested on:
jruby 1.1.3 and 1.1.4
jruby-engine 1.1.3 and 1.1.5
java 1.6.0_06_b02

Best greetings,
Pawe³ Wielgus.
E184bc2347f90dd61b509de6eb43a8b6?d=identicon&s=25 Yoko Harada (Guest)
on 2008-10-02 23:11
(Received via mailing list)
Hi,

2008/10/2 Pawe³ Wielgus <poulwiel@gmail.com>:
> return rubyEngine.eval(script);
> ----------------------------------------------------------------------------------

Is this code really work? I got a following error that JRuby raised:
    :1: <unknown>:1: Invalid char `\204' ('„') in expression
(SyntaxError)

Does "root" affect something on the output? Or some typo(s) in the above
code?

>
> on most of our computers returns string:
> ¡¯Æ¶¿æ¼
>
> but on two machines it returns:
> ???????

This happens when encodings of platform default and output strings are
not the same. JRuby engine checks sun.jnu.encoding and file.encoding
System properties to guess an appropriate encoding for outputs. If
none of two properties is found, JRuby engine applies UTF-8. I wonder
that the two failed machines use another System property name to
specify encodings.

>
> But if i change the last line in to:
> ----------------------------------------------------------------------------------
> String script = "¡¯Æ¶¿æ¼";
> rubyEngine.put("s", script);
> return rubyEngine.eval("eval $s");
> ----------------------------------------------------------------------------------
> it works well on all of our computers.

I think the last line should be
   return rubyEngine.eval("eval \"$s\"");
or "eval(\"$s\")". Then, I got the output ¡¯Æ¶¿æ¼. Typo, too?

> Does anybody has any idea why? And whats wrong with the first one?
>
> Tested on:
> jruby 1.1.3 and 1.1.4
> jruby-engine 1.1.3 and 1.1.5
> java 1.6.0_06_b02
>
> Best greetings,
> Pawe³ Wielgus.

-Yoko
B2625ae7fa60dce7697771ca1ef57444?d=identicon&s=25 Paweł Wielgus (Guest)
on 2008-10-03 10:06
(Received via mailing list)
Hi Yoko,

> Is this code really work? I got a following error that JRuby raised:
>    :1: <unknown>:1: Invalid char `\204' ('Â ') in expression (SyntaxError)

sorry my bad, i was copy pasting.

But thanks to Your help we found the problem, machines were script was
working correctly we have eclipse 3.4 and file.encoding=UTF-8 and
sun.jnu.encoding=CP1250, but on those two "bad" machines we have
eclipse 3.3 and file.encoding=Cp1250 and sun.jnu.encoding=Cp1250
So eclipse version is making the difference.

Now i have a question how it should be done properly?
Forcing file.encoding=UTF-8 helps on those two machines,
but is it the proper way?
Can i read more about it some where?

> Does "root" affect something on the output? Or some typo(s) in the above code?
No, it's just root of my business objects.

Best greetings,
Pawe³ Wielgus.
42b570f6f4312a872c2fc671e3ddc82b?d=identicon&s=25 Trejkaz Xx (trejkaz)
on 2008-10-13 05:07
Paweł Wielgus wrote:
> Now i have a question how it should be done properly?
> Forcing file.encoding=UTF-8 helps on those two machines,
> but is it the proper way?
> Can i read more about it some where?

I have noticed the same thing.

At first I thought it was simply a bug in the script engine side,
because it looks for sun.jnu.encoding and then file.encoding, not the
other way around.  Making it check file.encoding first fixes this
provided that the value you set is actually able to encode the text
(hence UTF-8 being an obvious choice.)

So there is certainly a bug in the script engine but fixing it isn't the
entire solution.  If you have file.encoding set to Cp1252 you get the
symptom you describe, JRuby continues to use it for the output.

Which is unfortunate, because:
  - JRuby is internally storing strings in UTF-8 anyway
  - JSR223 only lets you pass a Writer

So the naive approach of writing the byte[] directly to the OutputStream
would have worked fine, but whatever JRuby is currently doing does not.
:-(

What definitely does not work:
  - Passing KCODE in as a global from the outside.
  - changing the script engine to set $KCODE to KCode.UTF8 while
constructing
    the runtime.

Not being able to find where JRuby is getting the default encoding from
(I can't find any references to file.encoding or defaultCharset) makes
it tricky to figure it out, but was there some other way of overriding
this?

TX
Please log in before posting. Registration is free and takes only a minute.
Existing account

NEW: Do you have a Google/GoogleMail, Yahoo or Facebook account? No registration required!
Log in with Google account | Log in with Yahoo account | Log in with Facebook account
No account? Register here.