Cann't require UTF-8 files

o01eg · April 30, 2010, 7:26pm

When I require file with UTF-8 encoding I get error:

irb(main):001:0> require ‘/tmp/share/mudserver/game.rb’
SyntaxError: /tmp/share/mudserver/game.rb:2: invalid multibyte char
(US-ASCII)
/tmp/share/mudserver/game.rb:2: invalid multibyte char (US-ASCII)
/tmp/share/mudserver/game.rb:2: syntax error, unexpected $end, expecting
keyword_end

when I simply assign unicode string to variable I don’t get any error.
In C API I have such problem with rb_require and rb_eval_string.
I think that I have to set encoding for required files but cann’t find
how.
P.S. I try use $KCODE but it no longer work:

irb(main):006:0> $KCODE = ‘u’
(irb):6: warning: variable $KCODE is no longer effective; ignored

I try require recommended in Internet ‘jcode’ but it isn’t exist and try
to add u prefix for string, but it cause error even in evalation:

irb(main):005:0> intro = u"Ð¿Ñ€Ð¸Ð²ÐµÑ‚"
NoMethodError: undefined method `u’ for main:Object

o01eg · April 30, 2010, 8:16pm

On 4/30/10, O01eg Oleg [email protected] wrote:

In C API I have such problem with rb_require and rb_eval_string.
irb(main):005:0> intro = u"Ð¿Ñ€Ð¸Ð²ÐµÑ‚"
NoMethodError: undefined method `u’ for main:Object

Are you using ruby 1.9? If so, then you need to add a magic encoding
line as the first line (or second if the first is a shebang line) of
your source file, like this:

encoding: utf-8

o01eg · April 30, 2010, 8:19pm

Caleb C. wrote:

Are you using ruby 1.9? If so, then you need to add a magic encoding
line as the first line (or second if the first is a shebang line) of
your source file, like this:

encoding: utf-8

Thanks, it work.

o01eg · February 14, 2011, 9:30am

Are you using ruby 1.9? If so, then you need to add a magic encoding
line as the first line (or second if the first is a shebang line) of
your source file, like this:

encoding: utf-8

Is there a way to avoid adding this magic encoding line in each file?

That’s really a metadata and does not belong to the source code.

o01eg · February 14, 2011, 11:07pm

On Feb 14, 2011, at 3:30 AM, Fernando P. wrote:

Is there a way to avoid adding this magic encoding line in each file?

That’s really a metadata and does not belong to the source code.

If the encoding declaration isn’t in the file itself then where exactly
would you store it? If it isn’t in the file then it has to be in some
OS or filesystem specific meta-data store or in yet another file. All
of which increases the likelihood that the file and its meta-data will
get out of synch or won’t stay together when the file is copied or
transferred somewhere else.

Placing the encoding information in the file itself seems like the most
practical solution. The encoding declaration could of course be
incorrect, but that is always a possibility no matter where you store
the info.

Gary W.

o01eg · February 14, 2011, 9:48am

On Mon, Feb 14, 2011 at 2:30 AM, Fernando P.
[email protected]wrote:

Posted via http://www.ruby-forum.com/.

Run with -Ku flag.

gist.github.com

https://gist.github.com/JoshCheek/825626

main.rb

#!/usr/bin/env ruby -Ku

require File.dirname(__FILE__) + "/other"

other.rb

puts "1 ≤ 3"

o01eg · February 15, 2011, 12:09am

On Feb 14, 2011, at 12:48 AM, Josh C. wrote:

Run with -Ku flag.

This is not a good solution for library code.

o01eg · February 15, 2011, 3:05am

On 02/15/11 10:08, Eric H. wrote:

On Feb 14, 2011, at 12:48 AM, Josh C. wrote:

On Mon, Feb 14, 2011 at 2:30 AM, Fernando P.[email protected]wrote:

encoding: utf-8

Is there a way to avoid adding this magic encoding line in each file?
That’s really a metadata and does not belong to the source code.
Run with -Ku flag.
This is not a good solution for library code.

Right. Is there a good reason why Ruby can’t just detect a UTF-8 BOM?
It’s still “metadata” but a lot of tools deal with it.

o01eg · February 15, 2011, 6:17am

On Tue, Feb 15, 2011 at 3:05 AM, Clifford H. [email protected]
wrote:

Right. Is there a good reason why Ruby can’t just detect a UTF-8 BOM?

The use of a byte order mark is optional. Bit hard to detect what
isn’t there, is it?

Here’s a (short) discussion on auto-detecting Unicode:
http://blogs.msdn.com/b/oldnewthing/archive/2007/04/17/2158334.aspx

–
Phillip G.

Though the folk I have met,
(Ah, how soon!) they forget
When I’ve moved on to some other place,
There may be one or two,
When I’ve played and passed through,
Who’ll remember my song or my face.

o01eg · February 15, 2011, 9:48am

2011/2/14 Fernando P. [email protected]:

Are you using ruby 1.9? If so, then you need to add a magic encoding
line as the first line (or second if the first is a shebang line) of
your source file, like this:

encoding: utf-8

Is there a way to avoid adding this magic encoding line in each file?

That’s really a metadata and does not belong to the source code.

If it’s metadata, why are you using “require ‘file’” instead of
“File.read(‘file.rb’)”?

o01eg · February 15, 2011, 4:43pm

On 2/14/2011 8:05 PM, Clifford H. wrote:

Is there a good reason why Ruby can’t just detect a UTF-8 BOM?
It’s still “metadata” but a lot of tools deal with it.

Using a BOM would break shebang processing. It’s not a problem for
Windows users of Ruby since the shebang line is ignored there, but it
would break things for all Unix-like platforms (including Cygwin) where
a script can be run directly as a program:

My personal preference would be for a single multi-byte encoding to be
selected for all Ruby files. This would make it easier to configure an
editor or source visualizer to handle a file appropriately without the
need to replicate Ruby’s encoding detection. One downside though is
that existing scripts encoded differently may be broken for this
hypothetical Ruby’s consumption.

Using the magic comment to mark the encoding is probably the least
disruptive solution overall.

-Jeremy

o01eg · February 15, 2011, 7:06pm

On Feb 15, 2011, at 4:42 PM, Jeremy B. wrote:

Using the magic comment to mark the encoding is probably the least
disruptive solution overall.

-Jeremy

I usually recommend not using UTF-8 in source at all and
push all UTF-8 strings into localization files (Either using a
heavyweight
solution like i18n or just a plain YAML file, if you don’t want a
dependency).
This also circumvents the problem of headers and is good practice.
For scripts of smaller scope, I usually skip that rule ;).[2]

Ruby still assumes source code to be US-ASCII by default, which I think
is a good
choice for compatibility reasons.[1]

Regards,
Florian

[1] Which is also the assumption that Ruby 1.8 had, but not as explicit.
[2] A neat trick is the following:

require “yaml”
puts YAML.load(DATA).inspect

END

:test: Some nicode Data.

o01eg · February 17, 2011, 1:48pm

On Feb 17, 2011, at 9:31 AM, Fernando P. wrote:

I usually recommend not using UTF-8 in source at all and
push all UTF-8 strings into localization files (Either using a
heavyweight
solution like i18n or just a plain YAML file, if you don’t want a
dependency).

This makes the views (in RoR) unreadable, also we somehow lose
autocompletion by the text-editor of html in the yaml file.

I think at least the “unreadable” part is debatable. Autocompletion
might
be handy, but the features of your editor should not factor into the
organization
of your code.

Also, ERB templates are #read, which takes the external-encoding setting
into account and then evaluated using #eval, which does take the
encoding
of the string into account. Other templating libraries like haml have a
setting for the default template encoding. So templates are not
really the problem, as you can already use utf-8 pretty freely without
marking
it.

Regards,
Florian

o01eg · February 17, 2011, 9:31am

I usually recommend not using UTF-8 in source at all and
push all UTF-8 strings into localization files (Either using a
heavyweight
solution like i18n or just a plain YAML file, if you don’t want a
dependency).

This makes the views (in RoR) unreadable, also we somehow lose
autocompletion by the text-editor of html in the yaml file.