Forum: IronRuby $KCODE and encodings

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Shri B. (Guest)
on 2009-02-14 02:48
(Received via mailing list)
I was searching for string encoding issues in Ruby. Here is the summary
of what I learnt, in case its useful to anyone else of if anyone has any
corrections to this.

Ruby 1.8 support for encoding:

*         A comment like "# -*- coding: utf-8 -*-" at the start of the
file is supposed to determine how to parse a .rb file, but I haven't
really figured out how to make this work. Non-ansi characters cause an
error while loading the file.

*         ruby.exe -K<kcode> sets $KCODE (which can also be set
programmaticaly)

*         $KCODE affects the following:

*         Determines the encoding to use to parse .rb files. Normally,
identifiers have to be ANSI, but the limitation is removed if $KCODE is
set to "UTF8".

*         Affects whether inspect escapes non-ascii chars, or if it
leaves them as is.

*         Affects how regexps without an explicit encoding interpret the
input string.

Ruby 1.9 support for encodings:

*         Identifiers can be non-ANSI by default.

Ruby 2.0 support for encodings:

*         Each string and symbol knows its own encoding, and
String#force_encoding can change the encoding of an existing string.

*         IO#encoding to control encoding to use for reading/writing
from disk
Matthew Wilson (Guest)
on 2009-02-14 03:11
(Received via mailing list)
On Fri, Feb 13, 2009 at 5:01 PM, Shri B.
<removed_email_address@domain.invalid>wrote:
>
> Ruby 1.8 support for encoding:
>
> ·         A comment like "# -*- coding: utf-8 -*-" at the start of the
> file is supposed to determine how to parse a .rb file, but I haven't really
> figured out how to make this work. Non-ansi characters cause an error while
> loading the file.
>

Did the utf-8 file(s) you tried have a BOM or not?

-Matthew
Shri B. (Guest)
on 2009-02-14 04:09
(Received via mailing list)
Attachment: utf8_with_signature.rb (0 Bytes)
Attachment: utf8.rb (0 Bytes)
If I use Notepad2's menu to set the encoding to "UTF8 with signature",
and run either "ruby utf8_with_signature.rb" or "ruby -Ku
utf8_with_signature.rb", the file fails to parse. The file is attached.

If I save the file with encoding set just as "UTF8", the file is 3 bytes
smaller. "ruby utf8.rb" fails, but "ruby -Ku utf8.rb" works. With "-Ku",
things work even if I do not have "# -*- coding: utf-8 -*-" in the file.

The repro files are attached.

From: removed_email_address@domain.invalid
[mailto:removed_email_address@domain.invalid] On Behalf Of Matthew Wilson
Sent: Friday, February 13, 2009 5:11 PM
To: removed_email_address@domain.invalid
Subject: Re: [Ironruby-core] $KCODE and encodings

On Fri, Feb 13, 2009 at 5:01 PM, Shri B.
<removed_email_address@domain.invalid<mailto:removed_email_address@domain.invalid>> 
wrote:
Ruby 1.8 support for encoding:

*         A comment like "# -*- coding: utf-8 -*-" at the start of the
file is supposed to determine how to parse a .rb file, but I haven't
really figured out how to make this work. Non-ansi characters cause an
error while loading the file.

Did the utf-8 file(s) you tried have a BOM or not?

-Matthew
Tomas M. (Guest)
on 2009-02-14 04:48
(Received via mailing list)
AFAIK Ruby 1.8 doesn't support magic comments that specify encodings at
all, 1.9 does. Ruby 1.8 also doesn't recognize BOM.
Even version 1.9 has full encoding support, not just 2.0.

Tomas

From: removed_email_address@domain.invalid
[mailto:removed_email_address@domain.invalid] On Behalf Of Shri B.
Sent: Friday, February 13, 2009 3:01 PM
To: removed_email_address@domain.invalid
Subject: [Ironruby-core] $KCODE and encodings

I was searching for string encoding issues in Ruby. Here is the summary
of what I learnt, in case its useful to anyone else of if anyone has any
corrections to this.

Ruby 1.8 support for encoding:

*         A comment like "# -*- coding: utf-8 -*-" at the start of the
file is supposed to determine how to parse a .rb file, but I haven't
really figured out how to make this work. Non-ansi characters cause an
error while loading the file.

*         ruby.exe -K<kcode> sets $KCODE (which can also be set
programmaticaly)

*         $KCODE affects the following:

*         Determines the encoding to use to parse .rb files. Normally,
identifiers have to be ANSI, but the limitation is removed if $KCODE is
set to "UTF8".

*         Affects whether inspect escapes non-ascii chars, or if it
leaves them as is.

*         Affects how regexps without an explicit encoding interpret the
input string.

Ruby 1.9 support for encodings:

*         Identifiers can be non-ANSI by default.

Ruby 2.0 support for encodings:

*         Each string and symbol knows its own encoding, and
String#force_encoding can change the encoding of an existing string.

*         IO#encoding to control encoding to use for reading/writing
from disk
This topic is locked and can not be replied to.