Forum: IronRuby Unicode Source Files

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
F6ae506de87f28214456c4348b039ddd?d=identicon&s=25 Ted Milker (tmilker)
on 2008-10-26 17:38
(Received via mailing list)
Is the DLR going to be fixed so that it properly supports Unicode
source files or is this an issue with IronRuby?  If you attempt to
create a new Code File with Visual Studio 2008 and call it test.rb and
then execute it with:

ScriptRuntime runtime = IronRuby.Ruby.CreateRuntime();
runtime.ExecuteFile( "test.rb" );

it blows up on the Unicode byte-order marker with:

Unhandled Exception: Microsoft.Scripting.SyntaxErrorException: Invalid
character 'ï' in expression
   at Microsoft.Scripting.ErrorSink.Add(SourceUnit source, String
message, SourceSpan span, Int32 errorCode, Severity severity) in
C:\Users\ted\Desktop\IronRuby\src\Microsoft.Scripting\ErrorSink.cs:line
34
   at Microsoft.Scripting.ErrorCounter.Add(SourceUnit source, String
message, SourceSpan span, Int32 errorCode, Severity severity) in
C:\Users\ted\Desktop\IronRuby\src\Microsoft.Scripting\ErrorSink.cs:line
92
   at IronRuby.Compiler.Tokenizer.Report(String message, Int32
errorCode, SourceSpan location, Severity severity) in
C:\Users\ted\Desktop\IronRuby\src\ironruby\Compiler\Parser\Tokenizer.cs:line
430
   at IronRuby.Compiler.Tokenizer.ReportError(ErrorInfo info, Object[]
args) in
C:\Users\ted\Desktop\IronRuby\src\ironruby\Compiler\Parser\Tokenizer.cs:line
442
   at IronRuby.Compiler.Tokenizer.Tokenize(Boolean whitespaceSeen,
Boolean cmdState) in
C:\Users\ted\Desktop\IronRuby\src\ironruby\Compiler\Parser\Tokenizer.cs:line
966
   at IronRuby.Compiler.Tokenizer.Tokenize() in
C:\Users\ted\Desktop\IronRuby\src\ironruby\Compiler\Parser\Tokenizer.cs:line
739
   at IronRuby.Compiler.Tokenizer.GetNextToken() in
C:\Users\ted\Desktop\IronRuby\src\ironruby\Compiler\Parser\Tokenizer.cs:line
711
   at IronRuby.Compiler.Parser.GetNextToken() in
C:\Users\ted\Desktop\IronRuby\src\ironruby\Compiler\Parser\Parser.cs:line
99
   at IronRuby.Compiler.ShiftReduceParser`2.Parse() in
C:\Users\ted\Desktop\IronRuby\src\ironruby\Compiler\Parser\GPPG.cs:line
310
   at IronRuby.Compiler.Parser.Parse(SourceUnit sourceUnit,
RubyCompilerOptions options, ErrorSink errorSink) in
C:\Users\ted\Desktop\IronRuby\src\ironruby\Compiler\Parser\Parser.cs:line
158
   at IronRuby.Runtime.RubyContext.ParseSourceCode(SourceUnit
sourceUnit, RubyCompilerOptions options, ErrorSink errorSink) in
C:\Users\ted\Desktop\IronRuby\src\ironruby\Runtime\RubyContext.cs:line
203
   at IronRuby.Runtime.RubyContext.CompileSourceCode(SourceUnit
sourceUnit, CompilerOptions options, ErrorSink errorSink) in
C:\Users\ted\Desktop\IronRuby\src\ironruby\Runtime\RubyContext.cs:line
179
   at Microsoft.Scripting.SourceUnit.Compile(CompilerOptions options,
ErrorSink errorSink) in
C:\Users\ted\Desktop\IronRuby\src\Microsoft.Scripting\SourceUnit.cs:line
215
   at Microsoft.Scripting.SourceUnit.Execute(Scope scope, ErrorSink
errorSink) in
C:\Users\ted\Desktop\IronRuby\src\Microsoft.Scripting\SourceUnit.cs:line
225
   at Microsoft.Scripting.Hosting.ScriptSource.Execute(ScriptScope
scope) in
C:\Users\ted\Desktop\IronRuby\src\Microsoft.Scripting\Hosting\ScriptSource.cs:line
129
   at Microsoft.Scripting.Hosting.ScriptEngine.ExecuteFile(String
path, ScriptScope scope) in
C:\Users\ted\Desktop\IronRuby\src\Microsoft.Scripting\Hosting\ScriptEngine.cs:line
159
   at Microsoft.Scripting.Hosting.ScriptEngine.ExecuteFile(String
path) in
C:\Users\ted\Desktop\IronRuby\src\Microsoft.Scripting\Hosting\ScriptEngine.cs:line
148
   at Microsoft.Scripting.Hosting.ScriptRuntime.ExecuteFile(String
path) in
C:\Users\ted\Desktop\IronRuby\src\Microsoft.Scripting\Hosting\ScriptRuntime.cs:line
257
   at HostingDLRConsole.Program.Main(String[] args) in
C:\Users\ted\Documents\Visual Studio 2008\Projects\Books\IronRuby in
Action\HostingDLRConsole\HostingDLRConsole\Program.cs:line 14
Press any key to continue . . .

I know I can fix this by using the Advanced Save Options but the DLR
spec talks about Unicode support, so I assume this means that
ScriptRuntime.ExecuteFile() should also support Unicode source files.
F983f0c990cba2fe743ef62a975ec99c?d=identicon&s=25 Curt Hagenlocher (Guest)
on 2008-10-26 18:05
(Received via mailing list)
We do this for compatibility with Ruby 1.8.6, though as you can see, we
don't have the error message quite right:

PS F:\> C:\ruby\bin\ruby.exe x.rb
x.rb:1: Invalid char `\377' in expression
x.rb:1: Invalid char `\376' in expression

:)

I believe you'll need to save as UTF-8 and then manually strip the BOM
in order to use Unicode source files -- hopefully Tomas will tell me if
I'm wrong.

Source encoding for Ruby is extremely tricky, and (from what I can tell)
hasn't even yet been finalized for 1.9.x.  We will eventually support
whatever the Ruby standards are.
F6ae506de87f28214456c4348b039ddd?d=identicon&s=25 Ted Milker (tmilker)
on 2008-10-26 19:47
(Received via mailing list)
Why so rigorous?  I understand the need to maintain compatibility but
this effectively eliminates Visual Studio as an editor for .rb files,
without some kind of clunky build mechanism.  I guess I will just use
an extension method to get around the behavior for the time being.

>From the things I have read about Ruby and UTF-8, it seems more like
it is just extremely broken, rather than extremely tricky.  I still
cannot even get pure Ruby stuff in Windows to work properly with
UTF-8, like when using the Shoes toolkit for example.
F6ae506de87f28214456c4348b039ddd?d=identicon&s=25 Ted Milker (tmilker)
on 2008-10-26 20:08
(Received via mailing list)
Here is the extension method I am using if anyone else is interested:

public static object ExecuteUnicodeFile( this ScriptRuntime rt, string
filename )
{
    string rbCode;

    // OpenText will strip the BOM and keep the Unicode intact
    using( var rdr = File.OpenText( filename ) )
    {
        rbCode = rdr.ReadToEnd();
    }

    return IronRuby.Ruby.GetEngine( rt ).Execute( rbCode );
}

It works great for using Japanese in strings in Ruby with IronRuby and
WPF.
F983f0c990cba2fe743ef62a975ec99c?d=identicon&s=25 Curt Hagenlocher (Guest)
on 2008-10-26 21:39
(Received via mailing list)
If you save in "Western European (Windows) - Codepage 1252" from within
Visual Studio, you'll get the right result -- as long as you're not
using any characters with a codepoint greater than 127.  And if you are,
you're probably better off anyway expressing this code point as an
explicit set of UTF-8 compatible bytes because -- as you've noticed --
Ruby's currently a bit weird in its Unicode support.
This topic is locked and can not be replied to.