Encoding problem

Hi Guys,

My boss thought it would be cool to use “” in an sql tablename, many of
you will want to shoot her now J.

But now I did find something weird, I can’t even print “”.

It says:

tabaco.rb:16:in `puts’: character U+00EB can’t be encoded in US-ASCII
(Encoding::InvalidByteSequenceError)

    from tabaco.rb:16

or

when I print the string somewhere else :S when it comes back from a
method.

System::Text::DecoderFallbackException at /patient/0

Unable to translate bytes [EB] at index 3 from specified code page to
Unicode.

Or when I don’t mess with it

Encoding::InvalidByteSequenceError at /patient/0

invalid byte sequence EB on UTF-8

All the same problem coming from 3 places.

Is this a fundamental issue or should this be solvable?

If you could point me in the right direction I could try to maybe fix
it.

Thanks,

Albert-Jan

Hi,

I don’t really know the solution to your question, but this might help:

is Unicode U+00EB, which is 0xC3AB in UTF-8 (so we are dealing with
unicode rather than utf-8, which I assume is because IronRuby uses the
immutable .NET strings internally with Unicode encoding).

The errors are expected if your default encoding is US-ASCII because it
does
not contain (and uses single bytes, so the 0x00EB would be broken into
two
bytes and your script would choke on the second 0xEB) : you will need to
set
your encoding to something compatible, like utf-8.

I don’t quite know how to do that properly in IronRuby, but in CRuby 1.9
you
could use “magic comments” in your ruby file and in 1.8 something like
$KCODE=‘u’ could work. You might also be able to drop back into .NET and
set
the encoding there, but I’m not sure how that affects IronRuby
assemblies.

I would start with $KCODE = ‘u’ Let me know how that works for you.

Zaki

On Thu, Jan 13, 2011 at 6:33 PM, Albert-Jan Pieter Nijburg <

Hey Zaki,

WARNING: YAML.add_builtin_type is not implemented

unknown:0: warning: variable $KCODE is no longer effective

tabaco.rb:11:in `puts’: character U+00EB can’t be encoded in US-ASCII
(Encoding:

:InvalidByteSequenceError)

    from tabaco.rb:11

Too bad… thanks though. I’ll have a look in the source if I can find
something.

Annoying Europeans :stuck_out_tongue:

Albert-Jan

Van: [email protected]
[mailto:[email protected]] Namens Dezso Zoltan
Verzonden: donderdag 13 januari 2011 14:52
Aan: [email protected]
Onderwerp: Re: [Ironruby-core] Encoding problem

Hi,

I don’t really know the solution to your question, but this might help:

is Unicode U+00EB, which is 0xC3AB in UTF-8 (so we are dealing with
unicode rather than utf-8, which I assume is because IronRuby uses the
immutable .NET strings internally with Unicode encoding).

The errors are expected if your default encoding is US-ASCII because it
does not contain (and uses single bytes, so the 0x00EB would be broken
into two bytes and your script would choke on the second 0xEB) : you
will need to set your encoding to something compatible, like utf-8.

I don’t quite know how to do that properly in IronRuby, but in CRuby 1.9
you could use “magic comments” in your ruby file and in 1.8 something
like $KCODE=‘u’ could work. You might also be able to drop back into
.NET and set the encoding there, but I’m not sure how that affects
IronRuby assemblies.

I would start with $KCODE = ‘u’ Let me know how that works for you.

Zaki

On Thu, Jan 13, 2011 at 6:33 PM, Albert-Jan Pieter Nijburg
[email protected] wrote:

Hi Guys,

My boss thought it would be cool to use “” in an sql tablename, many of
you will want to shoot her now J.

But now I did find something weird, I can’t even print “”.

It says:

tabaco.rb:16:in `puts’: character U+00EB can’t be encoded in US-ASCII
(Encoding::InvalidByteSequenceError)

    from tabaco.rb:16

or

when I print the string somewhere else :S when it comes back from a
method.

System::Text::DecoderFallbackException at /patient/0

Unable to translate bytes [EB] at index 3 from specified code page to
Unicode.

Or when I don’t mess with it

Encoding::InvalidByteSequenceError at /patient/0

invalid byte sequence EB on UTF-8

All the same problem coming from 3 places.

Is this a fundamental issue or should this be solvable?

If you could point me in the right direction I could try to maybe fix
it.

Thanks,

Albert-Jan

Hi,

warning: variable $KCODE is no longer effective

This means that you are in 1.9 mode :slight_smile: In that case there are two things
you
could try:

  1. set the encoding at the top of the file in the form of the comment:

encoding: UTF-8

  1. force an encoding on the string(s) in question with the method (if 1)
    fails in IronRuby):
    .force_encoding(“UTF-8”)

Zaki

On Thu, Jan 13, 2011 at 11:20 PM, Albert-Jan Pieter Nijburg <

I Just found this:

#Encoding:UTF-8

puts “patiënt”

which outputs: pati´┐¢nt

It doesn’t crash anymore J

Van: [email protected]
[mailto:[email protected]] Namens Dezso Zoltan
Verzonden: donderdag 13 januari 2011 14:52
Aan: [email protected]
Onderwerp: Re: [Ironruby-core] Encoding problem

Hi,

I don’t really know the solution to your question, but this might help:

ë is Unicode U+00EB, which is 0xC3AB in UTF-8 (so we are dealing with
unicode rather than utf-8, which I assume is because IronRuby uses the
immutable .NET strings internally with Unicode encoding).

The errors are expected if your default encoding is US-ASCII because it
does not contain ë (and uses single bytes, so the 0x00EB would be broken
into two bytes and your script would choke on the second 0xEB) : you
will need to set your encoding to something compatible, like utf-8.

I don’t quite know how to do that properly in IronRuby, but in CRuby 1.9
you could use “magic comments” in your ruby file and in 1.8 something
like $KCODE=‘u’ could work. You might also be able to drop back into
.NET and set the encoding there, but I’m not sure how that affects
IronRuby assemblies.

I would start with $KCODE = ‘u’ Let me know how that works for you.

Zaki

On Thu, Jan 13, 2011 at 6:33 PM, Albert-Jan Pieter Nijburg
[email protected] wrote:

Hi Guys,

My boss thought it would be cool to use “ë” in an sql tablename, many of
you will want to shoot her now J.

But now I did find something weird, I can’t even print “ë”.

It says:

tabaco.rb:16:in `puts’: character U+00EB can’t be encoded in US-ASCII
(Encoding::InvalidByteSequenceError)

    from tabaco.rb:16

or

when I print the string somewhere else :S when it comes back from a
method.

System::Text::DecoderFallbackException at /patient/0

Unable to translate bytes [EB] at index 3 from specified code page to
Unicode.

Or when I don’t mess with it

Encoding::InvalidByteSequenceError at /patient/0

invalid byte sequence EB on UTF-8

All the same problem coming from 3 places.

Is this a fundamental issue or should this be solvable?

If you could point me in the right direction I could try to maybe fix
it.

Thanks,

Albert-Jan

Hey,

I found out that if I put nothing at the top and I do this:

puts “\x89”

it puts “”

if I put the #Encoding: UTF-8 at the top this happens. J

mscorlib:0:in `Throw’: Unable to translate bytes [89] at index -1 from
specifie

d code page to Unicode. (System::Text::DecoderFallbackException)

    from mscorlib:0:in `Fallback'

    from mscorlib:0:in `InternalFallback'

    from mscorlib:0:in `GetCharCount'

    from mscorlib:0:in `GetCharCount'

    from mscorlib:0:in `GetChars'

    from tabaco.rb:2:in `puts'

    from tabaco.rb:2

It does print it but then it dies.

#<Encoding: UTF-8>

puts “\x89”.force_encoding(“UTF-8”) does the same

#<Encoding: UTF-8>

puts “”.force_encoding(“UTF-8”) does the same as before.

Also without the #

So I thought I had it with the puts “\x89” and I tried this:

class PatGeg < ActiveRecord::Base

  set_table_name "Pati\x89ntGegevens"

end

PatGeg.first.Achternaam

and here’s what I got

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby/gems/1.9.1/g

ems/activerecord-3.0.0/lib/active_record/connection_adapters/abstract_adapter.rb

:200:in `log’: incompatible character encodings: UTF-8 and ASCII-8BIT
(Encoding:

:CompatibilityError)

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-sqlserver-adapter-3.0.0/lib/active_record/connecti

on_adapters/sqlserver/database_statements.rb:217:in `raw_select’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-sqlserver-adapter-3.0.0/lib/active_record/connecti

on_adapters/sqlserver/database_statements.rb:178:in `select’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/connection_adapters/abstra

ct/database_statements.rb:7:in `select_all’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/connection_adapters/abstra

ct/query_cache.rb:56:in `select_all’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/base.rb:467:in
`find_by_sq

l’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation.rb:64:in
`to_a’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/finder_methods.rb

:333:in `find_first’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/finder_methods.rb

:122:in `first’

    from c:6:in `__send__'

    from c:6:in `first'

I’ve tried to do the force_encoding(“UTF-8”) on this thing to which
results in something very similar :

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby/gems/1.9.1/g

ems/activerecord-sqlserver-adapter-3.0.0/lib/active_record/connection_adapters/s

qlserver/quoting.rb:31:in `=~': invalid byte sequence 89 on UTF-8
(Encoding::Inv

alidByteSequenceError)

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-sqlserver-adapter-3.0.0/lib/active_record/connecti

on_adapters/sqlserver/quoting.rb:31:in `quote_table_name’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/base.rb:597:in
`quoted_tab

le_name’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/query_methods.rb:

234:in `build_select’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/query_methods.rb:

159:in `build_arel’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/query_methods.rb:

110:in `arel’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation.rb:64:in
`to_a’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/finder_methods.rb

:333:in `find_first’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/finder_methods.rb

:122:in `first’

    from c:6:in `__send__'

    from c:6:in `first'

    from tabaco.rb:34

I have a feeling that ironruby and .net are not in sync with the
encodings

Albert-Jan

Van: [email protected]
[mailto:[email protected]] Namens Dezso Zoltan
Verzonden: donderdag 13 januari 2011 16:04
Aan: [email protected]
Onderwerp: Re: [Ironruby-core] Encoding problem

Hi,

warning: variable $KCODE is no longer effective

This means that you are in 1.9 mode :slight_smile: In that case there are two things
you could try:

  1. set the encoding at the top of the file in the form of the comment:

encoding: UTF-8

  1. force an encoding on the string(s) in question with the method (if 1)
    fails in IronRuby):

.force_encoding(“UTF-8”)

Zaki

On Thu, Jan 13, 2011 at 11:20 PM, Albert-Jan Pieter Nijburg
[email protected] wrote:

Hey Zaki,

WARNING: YAML.add_builtin_type is not implemented

unknown:0: warning: variable $KCODE is no longer effective

tabaco.rb:11:in `puts’: character U+00EB can’t be encoded in US-ASCII
(Encoding:

:InvalidByteSequenceError)

    from tabaco.rb:11

Too bad… thanks though. I’ll have a look in the source if I can find
something.

Annoying Europeans :stuck_out_tongue:

Albert-Jan

Van: [email protected]
[mailto:[email protected]] Namens Dezso Zoltan
Verzonden: donderdag 13 januari 2011 14:52
Aan: [email protected]
Onderwerp: Re: [Ironruby-core] Encoding problem

Hi,

I don’t really know the solution to your question, but this might help:

is Unicode U+00EB, which is 0xC3AB in UTF-8 (so we are dealing with
unicode rather than utf-8, which I assume is because IronRuby uses the
immutable .NET strings internally with Unicode encoding).

The errors are expected if your default encoding is US-ASCII because it
does not contain (and uses single bytes, so the 0x00EB would be broken
into two bytes and your script would choke on the second 0xEB) : you
will need to set your encoding to something compatible, like utf-8.

I don’t quite know how to do that properly in IronRuby, but in CRuby 1.9
you could use “magic comments” in your ruby file and in 1.8 something
like $KCODE=‘u’ could work. You might also be able to drop back into
.NET and set the encoding there, but I’m not sure how that affects
IronRuby assemblies.

I would start with $KCODE = ‘u’ Let me know how that works for you.

Zaki

On Thu, Jan 13, 2011 at 6:33 PM, Albert-Jan Pieter Nijburg
[email protected] wrote:

Hi Guys,

My boss thought it would be cool to use “” in an sql tablename, many of
you will want to shoot her now J.

But now I did find something weird, I can’t even print “”.

It says:

tabaco.rb:16:in `puts’: character U+00EB can’t be encoded in US-ASCII
(Encoding::InvalidByteSequenceError)

    from tabaco.rb:16

or

when I print the string somewhere else :S when it comes back from a
method.

System::Text::DecoderFallbackException at /patient/0

Unable to translate bytes [EB] at index 3 from specified code page to
Unicode.

Or when I don’t mess with it

Encoding::InvalidByteSequenceError at /patient/0

invalid byte sequence EB on UTF-8

All the same problem coming from 3 places.

Is this a fundamental issue or should this be solvable?

If you could point me in the right direction I could try to maybe fix
it.

Thanks,

Albert-Jan


Ironruby-core mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ironruby-core

What IronRuby version do you use?

On my machine (github/master):

a.rb (saved as UTF-8 encoded file):

encoding: UTF-8

a = “ë”
b = “\u{eb}”

puts a.encoding, b.encoding, a, b, a.inspect, b.inspect

C:\Temp>rbx a.rb
UTF-8
UTF-8
ë
ë
“\u{eb}”
“\u{eb}”

Which is also what MRI 1.9.2 does.

Tomas

From: [email protected]
[mailto:[email protected]] On Behalf Of Albert-Jan
Pieter Nijburg
Sent: Thursday, January 13, 2011 7:49 AM
To: [email protected]
Subject: Re: [Ironruby-core] Encoding problem

Hey,

I found out that if I put nothing at the top and I do this:

puts “\x89”

it puts “ë”

if I put the #Encoding: UTF-8 at the top this happens. :smiling_face:

ëmscorlib:0:in Throw': Unable to translate bytes [89] at index -1 from specifie d code page to Unicode. (System::Text::DecoderFallbackException) from mscorlib:0:in Fallback’
from mscorlib:0:in InternalFallback' from mscorlib:0:in GetCharCount’
from mscorlib:0:in GetCharCount' from mscorlib:0:in GetChars’
from tabaco.rb:2:in `puts’
from tabaco.rb:2

It does print it but then it dies.

#<Encoding: UTF-8>
puts “\x89”.force_encoding(“UTF-8”) does the same

#<Encoding: UTF-8>
puts “ë”.force_encoding(“UTF-8”) does the same as before.
Also without the #

So I thought I had it with the puts “\x89” and I tried this:

class PatGeg < ActiveRecord::Base
set_table_name “Pati\x89ntGegevens”
end

PatGeg.first.Achternaam

and here’s what I got

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby/gems/1.9.1/g
ems/activerecord-3.0.0/lib/active_record/connection_adapters/abstract_adapter.rb
:200:in log': incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding: :CompatibilityError) from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby /gems/1.9.1/gems/activerecord-sqlserver-adapter-3.0.0/lib/active_record/connecti on_adapters/sqlserver/database_statements.rb:217:in raw_select’
from
c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-sqlserver-adapter-3.0.0/lib/active_record/connecti
on_adapters/sqlserver/database_statements.rb:178:in select' from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby /gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/connection_adapters/abstra ct/database_statements.rb:7:in select_all’
from
c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/connection_adapters/abstra
ct/query_cache.rb:56:in select_all' from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby /gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/base.rb:467:in find_by_sq
l’
from
c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation.rb:64:in
to_a' from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby /gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/finder_methods.rb :333:in find_first’
from
c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/finder_methods.rb
:122:in first' from c:6:in send
from c:6:in `first’

I’ve tried to do the force_encoding(“UTF-8”) on this thing to which
results in something very similar :

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby/gems/1.9.1/g
ems/activerecord-sqlserver-adapter-3.0.0/lib/active_record/connection_adapters/s
qlserver/quoting.rb:31:in =~': invalid byte sequence 89 on UTF-8 (Encoding::Inv alidByteSequenceError) from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby /gems/1.9.1/gems/activerecord-sqlserver-adapter-3.0.0/lib/active_record/connecti on_adapters/sqlserver/quoting.rb:31:in quote_table_name’
from
c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/base.rb:597:in
quoted_tab le_name' from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby /gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/query_methods.rb: 234:in build_select’
from
c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/query_methods.rb:
159:in build_arel' from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby /gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/query_methods.rb: 110:in arel’
from
c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation.rb:64:in
to_a' from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby /gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/finder_methods.rb :333:in find_first’
from
c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/finder_methods.rb
:122:in first' from c:6:in send
from c:6:in `first’
from tabaco.rb:34

I have a feeling that ironruby and .net are not in sync with the
encodings

Albert-Jan

Van: [email protected]
[mailto:[email protected]] Namens Dezso Zoltan
Verzonden: donderdag 13 januari 2011 16:04
Aan: [email protected]
Onderwerp: Re: [Ironruby-core] Encoding problem

Hi,

warning: variable $KCODE is no longer effective

This means that you are in 1.9 mode :slight_smile: In that case there are two things
you could try:

  1. set the encoding at the top of the file in the form of the comment:

encoding: UTF-8

  1. force an encoding on the string(s) in question with the method (if 1)
    fails in IronRuby):
    .force_encoding(“UTF-8”)

Zaki

On Thu, Jan 13, 2011 at 11:20 PM, Albert-Jan Pieter Nijburg
<[email protected]mailto:[email protected]> wrote:
Hey Zaki,

WARNING: YAML.add_builtin_type is not implemented
unknown:0: warning: variable $KCODE is no longer effective
tabaco.rb:11:in `puts’: character U+00EB can’t be encoded in US-ASCII
(Encoding:
:InvalidByteSequenceError)
from tabaco.rb:11

Too bad… thanks though. I’ll have a look in the source if I can find
something.

Annoying Europeans :stuck_out_tongue:

Albert-Jan

Van:
[email protected]mailto:[email protected]
[mailto:[email protected]mailto:[email protected]rg]
Namens Dezso Zoltan
Verzonden: donderdag 13 januari 2011 14:52
Aan: [email protected]mailto:[email protected]
Onderwerp: Re: [Ironruby-core] Encoding problem

Hi,

I don’t really know the solution to your question, but this might help:
ë is Unicode U+00EB, which is 0xC3AB in UTF-8 (so we are dealing with
unicode rather than utf-8, which I assume is because IronRuby uses the
immutable .NET strings internally with Unicode encoding).

The errors are expected if your default encoding is US-ASCII because it
does not contain ë (and uses single bytes, so the 0x00EB would be broken
into two bytes and your script would choke on the second 0xEB) : you
will need to set your encoding to something compatible, like utf-8.

I don’t quite know how to do that properly in IronRuby, but in CRuby 1.9
you could use “magic comments” in your ruby file and in 1.8 something
like $KCODE=‘u’ could work. You might also be able to drop back into
.NET and set the encoding there, but I’m not sure how that affects
IronRuby assemblies.

I would start with $KCODE = ‘u’ Let me know how that works for you.

Zaki

On Thu, Jan 13, 2011 at 6:33 PM, Albert-Jan Pieter Nijburg
<[email protected]mailto:[email protected]> wrote:
Hi Guys,

My boss thought it would be cool to use “ë” in an sql tablename, many of
you will want to shoot her now :smiling_face:.

But now I did find something weird, I can’t even print “ë”.

It says:

tabaco.rb:16:in `puts’: character U+00EB can’t be encoded in US-ASCII
(Encoding::InvalidByteSequenceError)
from tabaco.rb:16

or

when I print the string somewhere else :S when it comes back from a
method.

System::Text::DecoderFallbackException at /patient/0
Unable to translate bytes [EB] at index 3 from specified code page to
Unicode.

Or when I don’t mess with it

Encoding::InvalidByteSequenceError at /patient/0
invalid byte sequence EB on UTF-8

All the same problem coming from 3 places.

Is this a fundamental issue or should this be solvable?

If you could point me in the right direction I could try to maybe fix
it.

Thanks,

Albert-Jan


Ironruby-core mailing list
[email protected]mailto:[email protected]
http://rubyforge.org/mailman/listinfo/ironruby-core

I have a clone from the github repos and it’s at the tip.

It appears that scite does not save its files as UTF-8 by default, I
assumed it did. Which solves the problem J

Even without the #encoding it works

Thanks

Van: [email protected]
[mailto:[email protected]] Namens Tomas M.
Verzonden: donderdag 13 januari 2011 19:13
Aan: [email protected]
Onderwerp: Re: [Ironruby-core] Encoding problem

What IronRuby version do you use?

On my machine (github/master):

a.rb (saved as UTF-8 encoded file):

encoding: UTF-8

a = “ë”

b = “\u{eb}”

puts a.encoding, b.encoding, a, b, a.inspect, b.inspect

C:\Temp>rbx a.rb

UTF-8

UTF-8

ë

ë

“\u{eb}”

“\u{eb}”

Which is also what MRI 1.9.2 does.

Tomas

From: [email protected]
[mailto:[email protected]] On Behalf Of Albert-Jan
Pieter Nijburg
Sent: Thursday, January 13, 2011 7:49 AM
To: [email protected]
Subject: Re: [Ironruby-core] Encoding problem

Hey,

I found out that if I put nothing at the top and I do this:

puts “\x89”

it puts “ë”

if I put the #Encoding: UTF-8 at the top this happens. J

ëmscorlib:0:in `Throw’: Unable to translate bytes [89] at index -1 from
specifie

d code page to Unicode. (System::Text::DecoderFallbackException)

    from mscorlib:0:in `Fallback'

    from mscorlib:0:in `InternalFallback'

    from mscorlib:0:in `GetCharCount'

    from mscorlib:0:in `GetCharCount'

    from mscorlib:0:in `GetChars'

    from tabaco.rb:2:in `puts'

    from tabaco.rb:2

It does print it but then it dies.

#<Encoding: UTF-8>

puts “\x89”.force_encoding(“UTF-8”) does the same

#<Encoding: UTF-8>

puts “ë”.force_encoding(“UTF-8”) does the same as before.

Also without the #

So I thought I had it with the puts “\x89” and I tried this:

class PatGeg < ActiveRecord::Base

  set_table_name "Pati\x89ntGegevens"

end

PatGeg.first.Achternaam

and here’s what I got

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby/gems/1.9.1/g

ems/activerecord-3.0.0/lib/active_record/connection_adapters/abstract_adapter.rb

:200:in `log’: incompatible character encodings: UTF-8 and ASCII-8BIT
(Encoding:

:CompatibilityError)

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-sqlserver-adapter-3.0.0/lib/active_record/connecti

on_adapters/sqlserver/database_statements.rb:217:in `raw_select’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-sqlserver-adapter-3.0.0/lib/active_record/connecti

on_adapters/sqlserver/database_statements.rb:178:in `select’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/connection_adapters/abstra

ct/database_statements.rb:7:in `select_all’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/connection_adapters/abstra

ct/query_cache.rb:56:in `select_all’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/base.rb:467:in
`find_by_sq

l’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation.rb:64:in
`to_a’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/finder_methods.rb

:333:in `find_first’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/finder_methods.rb

:122:in `first’

    from c:6:in `__send__'

    from c:6:in `first'

I’ve tried to do the force_encoding(“UTF-8”) on this thing to which
results in something very similar :

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby/gems/1.9.1/g

ems/activerecord-sqlserver-adapter-3.0.0/lib/active_record/connection_adapters/s

qlserver/quoting.rb:31:in `=~': invalid byte sequence 89 on UTF-8
(Encoding::Inv

alidByteSequenceError)

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-sqlserver-adapter-3.0.0/lib/active_record/connecti

on_adapters/sqlserver/quoting.rb:31:in `quote_table_name’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/base.rb:597:in
`quoted_tab

le_name’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/query_methods.rb:

234:in `build_select’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/query_methods.rb:

159:in `build_arel’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/query_methods.rb:

110:in `arel’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation.rb:64:in
`to_a’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/finder_methods.rb

:333:in `find_first’

    from 

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby

/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/finder_methods.rb

:122:in `first’

    from c:6:in `__send__'

    from c:6:in `first'

    from tabaco.rb:34

I have a feeling that ironruby and .net are not in sync with the
encodings

Albert-Jan

Van: [email protected]
[mailto:[email protected]] Namens Dezso Zoltan
Verzonden: donderdag 13 januari 2011 16:04
Aan: [email protected]
Onderwerp: Re: [Ironruby-core] Encoding problem

Hi,

warning: variable $KCODE is no longer effective

This means that you are in 1.9 mode :slight_smile: In that case there are two things
you could try:

  1. set the encoding at the top of the file in the form of the comment:

encoding: UTF-8

  1. force an encoding on the string(s) in question with the method (if 1)
    fails in IronRuby):

.force_encoding(“UTF-8”)

Zaki

On Thu, Jan 13, 2011 at 11:20 PM, Albert-Jan Pieter Nijburg
[email protected] wrote:

Hey Zaki,

WARNING: YAML.add_builtin_type is not implemented

unknown:0: warning: variable $KCODE is no longer effective

tabaco.rb:11:in `puts’: character U+00EB can’t be encoded in US-ASCII
(Encoding:

:InvalidByteSequenceError)

    from tabaco.rb:11

Too bad… thanks though. I’ll have a look in the source if I can find
something.

Annoying Europeans :stuck_out_tongue:

Albert-Jan

Van: [email protected]
[mailto:[email protected]] Namens Dezso Zoltan
Verzonden: donderdag 13 januari 2011 14:52
Aan: [email protected]
Onderwerp: Re: [Ironruby-core] Encoding problem

Hi,

I don’t really know the solution to your question, but this might help:

ë is Unicode U+00EB, which is 0xC3AB in UTF-8 (so we are dealing with
unicode rather than utf-8, which I assume is because IronRuby uses the
immutable .NET strings internally with Unicode encoding).

The errors are expected if your default encoding is US-ASCII because it
does not contain ë (and uses single bytes, so the 0x00EB would be broken
into two bytes and your script would choke on the second 0xEB) : you
will need to set your encoding to something compatible, like utf-8.

I don’t quite know how to do that properly in IronRuby, but in CRuby 1.9
you could use “magic comments” in your ruby file and in 1.8 something
like $KCODE=‘u’ could work. You might also be able to drop back into
.NET and set the encoding there, but I’m not sure how that affects
IronRuby assemblies.

I would start with $KCODE = ‘u’ Let me know how that works for you.

Zaki

On Thu, Jan 13, 2011 at 6:33 PM, Albert-Jan Pieter Nijburg
[email protected] wrote:

Hi Guys,

My boss thought it would be cool to use “ë” in an sql tablename, many of
you will want to shoot her now J.

But now I did find something weird, I can’t even print “ë”.

It says:

tabaco.rb:16:in `puts’: character U+00EB can’t be encoded in US-ASCII
(Encoding::InvalidByteSequenceError)

    from tabaco.rb:16

or

when I print the string somewhere else :S when it comes back from a
method.

System::Text::DecoderFallbackException at /patient/0

Unable to translate bytes [EB] at index 3 from specified code page to
Unicode.

Or when I don’t mess with it

Encoding::InvalidByteSequenceError at /patient/0

invalid byte sequence EB on UTF-8

All the same problem coming from 3 places.

Is this a fundamental issue or should this be solvable?

If you could point me in the right direction I could try to maybe fix
it.

Thanks,

Albert-Jan


Ironruby-core mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ironruby-core

Most editors insert BOM character at the beginning of the file. This is
a byte sequence that allows readers to identify Unicode encoding and is
not usually visible in editors.

Tomas

From: [email protected]
[mailto:[email protected]] On Behalf Of Albert-Jan
Pieter Nijburg
Sent: Friday, January 14, 2011 12:49 AM
To: [email protected]
Subject: Re: [Ironruby-core] Encoding problem

I have a clone from the github repos and it’s at the tip.

It appears that scite does not save its files as UTF-8 by default, I
assumed it did. Which solves the problem :smiling_face:

Even without the #encoding it works

Thanks

Van: [email protected]
[mailto:[email protected]] Namens Tomas M.
Verzonden: donderdag 13 januari 2011 19:13
Aan: [email protected]
Onderwerp: Re: [Ironruby-core] Encoding problem

What IronRuby version do you use?

On my machine (github/master):

a.rb (saved as UTF-8 encoded file):

encoding: UTF-8

a = “ë”
b = “\u{eb}”

puts a.encoding, b.encoding, a, b, a.inspect, b.inspect

C:\Temp>rbx a.rb
UTF-8
UTF-8
ë
ë
“\u{eb}”
“\u{eb}”

Which is also what MRI 1.9.2 does.

Tomas

From: [email protected]
[mailto:[email protected]] On Behalf Of Albert-Jan
Pieter Nijburg
Sent: Thursday, January 13, 2011 7:49 AM
To: [email protected]
Subject: Re: [Ironruby-core] Encoding problem

Hey,

I found out that if I put nothing at the top and I do this:

puts “\x89”

it puts “ë”

if I put the #Encoding: UTF-8 at the top this happens. :smiling_face:

ëmscorlib:0:in Throw': Unable to translate bytes [89] at index -1 from specifie d code page to Unicode. (System::Text::DecoderFallbackException) from mscorlib:0:in Fallback’
from mscorlib:0:in InternalFallback' from mscorlib:0:in GetCharCount’
from mscorlib:0:in GetCharCount' from mscorlib:0:in GetChars’
from tabaco.rb:2:in `puts’
from tabaco.rb:2

It does print it but then it dies.

#<Encoding: UTF-8>
puts “\x89”.force_encoding(“UTF-8”) does the same

#<Encoding: UTF-8>
puts “ë”.force_encoding(“UTF-8”) does the same as before.
Also without the #

So I thought I had it with the puts “\x89” and I tried this:

class PatGeg < ActiveRecord::Base
set_table_name “Pati\x89ntGegevens”
end

PatGeg.first.Achternaam

and here’s what I got

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby/gems/1.9.1/g
ems/activerecord-3.0.0/lib/active_record/connection_adapters/abstract_adapter.rb
:200:in log': incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding: :CompatibilityError) from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby /gems/1.9.1/gems/activerecord-sqlserver-adapter-3.0.0/lib/active_record/connecti on_adapters/sqlserver/database_statements.rb:217:in raw_select’
from
c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-sqlserver-adapter-3.0.0/lib/active_record/connecti
on_adapters/sqlserver/database_statements.rb:178:in select' from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby /gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/connection_adapters/abstra ct/database_statements.rb:7:in select_all’
from
c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/connection_adapters/abstra
ct/query_cache.rb:56:in select_all' from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby /gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/base.rb:467:in find_by_sq
l’
from
c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation.rb:64:in
to_a' from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby /gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/finder_methods.rb :333:in find_first’
from
c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/finder_methods.rb
:122:in first' from c:6:in send
from c:6:in `first’

I’ve tried to do the force_encoding(“UTF-8”) on this thing to which
results in something very similar :

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby/gems/1.9.1/g
ems/activerecord-sqlserver-adapter-3.0.0/lib/active_record/connection_adapters/s
qlserver/quoting.rb:31:in =~': invalid byte sequence 89 on UTF-8 (Encoding::Inv alidByteSequenceError) from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby /gems/1.9.1/gems/activerecord-sqlserver-adapter-3.0.0/lib/active_record/connecti on_adapters/sqlserver/quoting.rb:31:in quote_table_name’
from
c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/base.rb:597:in
quoted_tab le_name' from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby /gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/query_methods.rb: 234:in build_select’
from
c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/query_methods.rb:
159:in build_arel' from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby /gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/query_methods.rb: 110:in arel’
from
c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation.rb:64:in
to_a' from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby /gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/finder_methods.rb :333:in find_first’
from
c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/finder_methods.rb
:122:in first' from c:6:in send
from c:6:in `first’
from tabaco.rb:34

I have a feeling that ironruby and .net are not in sync with the
encodings

Albert-Jan

Van: [email protected]
[mailto:[email protected]] Namens Dezso Zoltan
Verzonden: donderdag 13 januari 2011 16:04
Aan: [email protected]
Onderwerp: Re: [Ironruby-core] Encoding problem

Hi,

warning: variable $KCODE is no longer effective

This means that you are in 1.9 mode :slight_smile: In that case there are two things
you could try:

  1. set the encoding at the top of the file in the form of the comment:

encoding: UTF-8

  1. force an encoding on the string(s) in question with the method (if 1)
    fails in IronRuby):
    .force_encoding(“UTF-8”)

Zaki

On Thu, Jan 13, 2011 at 11:20 PM, Albert-Jan Pieter Nijburg
<[email protected]mailto:[email protected]> wrote:
Hey Zaki,

WARNING: YAML.add_builtin_type is not implemented
unknown:0: warning: variable $KCODE is no longer effective
tabaco.rb:11:in `puts’: character U+00EB can’t be encoded in US-ASCII
(Encoding:
:InvalidByteSequenceError)
from tabaco.rb:11

Too bad… thanks though. I’ll have a look in the source if I can find
something.

Annoying Europeans :stuck_out_tongue:

Albert-Jan

Van:
[email protected]mailto:[email protected]
[mailto:[email protected]mailto:[email protected]rg]
Namens Dezso Zoltan
Verzonden: donderdag 13 januari 2011 14:52
Aan: [email protected]mailto:[email protected]
Onderwerp: Re: [Ironruby-core] Encoding problem

Hi,

I don’t really know the solution to your question, but this might help:
ë is Unicode U+00EB, which is 0xC3AB in UTF-8 (so we are dealing with
unicode rather than utf-8, which I assume is because IronRuby uses the
immutable .NET strings internally with Unicode encoding).

The errors are expected if your default encoding is US-ASCII because it
does not contain ë (and uses single bytes, so the 0x00EB would be broken
into two bytes and your script would choke on the second 0xEB) : you
will need to set your encoding to something compatible, like utf-8.

I don’t quite know how to do that properly in IronRuby, but in CRuby 1.9
you could use “magic comments” in your ruby file and in 1.8 something
like $KCODE=‘u’ could work. You might also be able to drop back into
.NET and set the encoding there, but I’m not sure how that affects
IronRuby assemblies.

I would start with $KCODE = ‘u’ Let me know how that works for you.

Zaki

On Thu, Jan 13, 2011 at 6:33 PM, Albert-Jan Pieter Nijburg
<[email protected]mailto:[email protected]> wrote:
Hi Guys,

My boss thought it would be cool to use “ë” in an sql tablename, many of
you will want to shoot her now :smiling_face:.

But now I did find something weird, I can’t even print “ë”.

It says:

tabaco.rb:16:in `puts’: character U+00EB can’t be encoded in US-ASCII
(Encoding::InvalidByteSequenceError)
from tabaco.rb:16

or

when I print the string somewhere else :S when it comes back from a
method.

System::Text::DecoderFallbackException at /patient/0
Unable to translate bytes [EB] at index 3 from specified code page to
Unicode.

Or when I don’t mess with it

Encoding::InvalidByteSequenceError at /patient/0
invalid byte sequence EB on UTF-8

All the same problem coming from 3 places.

Is this a fundamental issue or should this be solvable?

If you could point me in the right direction I could try to maybe fix
it.

Thanks,

Albert-Jan


Ironruby-core mailing list
[email protected]mailto:[email protected]
http://rubyforge.org/mailman/listinfo/ironruby-core