Bug in how Ruby 2.1,2.2 handles Encoding::ConverterNotFoundError


#1

Given:

“\x80”.force_encoding(“ASCII-8BIT”).encode( Encoding::Emacs_Mule)

raises an Encoding::ConverterNotFoundError on all rubies

On all rubies except for mri 2.1 and 2.2, encoding with the invalid
option has no effect. But on 2.1 and 2.2, it replaces the “\x80” ==
128.chr with the replace string (’?’, 63.chr)

for each ruby version with UTF-8 encoding
[“1.9.2”, “ruby”, #Encoding:UTF-8, #Encoding:UTF-8, 128]
[“1.9.3”, “ruby”, #Encoding:UTF-8, #Encoding:UTF-8, 128]
[“2.0.0”, “ruby”, #Encoding:UTF-8, #Encoding:UTF-8, 128]
[“2.1.5”, “ruby”, #Encoding:UTF-8, #Encoding:UTF-8, 63]
[“2.2.0”, “ruby”, #Encoding:UTF-8, #Encoding:UTF-8, 63]
[“2.0.0”, “jruby”, #Encoding:UTF-8, #Encoding:UTF-8, 128]
[“2.1.0”, “rbx”, #Encoding:UTF-8, #Encoding:UTF-8, 128]

[“1.9.3”, “jruby”, #Encoding:UTF-8, #Encoding:UTF-8, 128]
[“2.0.0”, “jruby”, #Encoding:UTF-8, #Encoding:UTF-8, 128]

Here’s my test code

LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8
rvm
ruby-1.9.2-p330,ruby-1.9.3-p551,ruby-2.0.0-p598,ruby-2.1.5,ruby-2.2.0,jruby-1.7.18,rbx-2.2.2
do
ruby -e ‘p [RUBY_VERSION, RUBY_ENGINE, Encoding.default_external,
ENCODING] +
“\x80”.force_encoding(“ASCII-8BIT”).force_encoding(“Emacs-Mule”).encode(:invalid
=> :replace).bytes.to_a’

for version in 1.9 2.0; do
export JRUBY_OPTS="-Xcompat.version=${version}" ;
LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8
rvm jruby-1.7.18 do
ruby -e ‘p [RUBY_VERSION, RUBY_ENGINE, Encoding.default_external,
ENCODING] +
“\x80”.force_encoding(“ASCII-8BIT”).force_encoding(“Emacs-Mule”).encode(:invalid
=> :replace).bytes.to_a’ ; done

For the particularly curious, this is relevant to a PR I have for
rspec-support
https://github.com/rspec/rspec-support/pull/151#discussion_r22573031


#2

String#encode converts to Encoding.default_internal by default, not
default_exteranl.
And default_internal is nil by default, it doesn’t change the encoding.
In this case, it had done nothing, except for just making a copy, till
2.0.

But as many people made same mistake like yours, expecting invalid
chars were removed/replaced, the behavior has changed since 2.1 to
replace such chars if
:replace is given.


#3

Nobuyoshi N. wrote in post #1166362:

Thank you! That is very helpful!

But as many people made same mistake like yours, expecting invalid
chars were removed/replaced,

Actually, I’ve just been reading code and testing and observing what
Ruby does.

I was asked why the behavior changed and I didn’t know, since it is
reasonable for Ruby to ignore the :invalid directive when no converter
is found,

Thank you so much!

-Benjamin

p.s. Now I just need to figure out how all the Rubies reconcile the
:undef, :invalid, and :replace with :fallback

I don’t really know C, but it appears to me from the code that :invalid
and :undef are called before :fallback (ecflags?)

https://github.com/ruby/ruby/blob/34fbf57aaafee9390a0f7427eb90efac099e33ec/transcode.c#L2677-L2733

https://github.com/ruby/ruby/blob/34fbf57aaafee9390a0f7427eb90efac099e33ec/transcode.c#L2838-L2858

https://github.com/ruby/ruby/blob/34fbf57aaafee9390a0f7427eb90efac099e33ec/transcode.c#L2282-L2290