Internal string storage and Encoding::Converter#convpath

ptyler · May 22, 2010, 12:03am

Hi, everyone:

In #rubyspec we were discussing whether the specifications are correct
for Encoding::Converter’s convpath method. Since MRI uses UTF-8
internally, the #convpath method shows that it converts to UTF-8 for an
intermediate step:

Encoding::Converter.new(‘ascii’,‘Big5’).convpath
=> [[Encoding::US_ASCII, Encoding::UTF_8], [Encoding::UTF_8,
Encoding::Big5]]

Is the fact that MRI uses UTF-8 for its intermediate steps between
incompatible encodings an implementation detail, or is it desired Ruby
behavior?

Thanks very much,
– Patrick T.

ptyler · May 22, 2010, 3:39am

2010/5/22 Patrick T. [email protected]:

In #rubyspec we were discussing whether the specifications are correct for Encoding::Converter’s convpath method. Since MRI uses UTF-8 internally, the #convpath method shows that it converts to UTF-8 for an intermediate step:

UTF-8 is not required:

% ruby -e ‘p Encoding::Converter.new(“euc-jp”, “shift_jis”).convpath’
[[#Encoding:EUC-JP, #Encoding:Shift_JIS]]

ptyler · May 24, 2010, 7:00pm

For encodings that can be converted directly (like EUC-JP to SJIS), I
understand that no UTF-8 internal storage is required. However, what
about encodings that do require an intermediate step? Is the choice of
UTF-8 as an intermediate representation an implementation detail?

Thanks,
– Patrick T.