[BUG:trunk] [m17n] TestCSVFeatures fails because of r20905

Yuguie$B$G$9!#e(B

r20905e$B$G!"e(Bcsve$B$N%F%9%H$,<:GT$9$k$h$&$K$J$j$^$7$?!#e(Bpathe$B$N%(%s%3!<%G%#%s%0e(B
e$B$,e(BUTF8-MACe$B$KJQ$o$C$?$?$a!"e(BCSV#inspecte$BFb$Ge(BString#encodee$B$K<:GT$7$F$$$k$h$&e(B
e$B$G$9!#e(B
e$B$3$l$O$I$&07$C$?$i$h$$$b$N$G$7$g$&$+!#e(B

% make test-all TESTS=csv
./miniruby -I…/…/lib -I.ext/common -I./- -r…/…/ext/purelib.rb
…/…/runruby.rb --extout=.ext – “…/…/test/runner.rb” csv
nil
Loaded suite …/…/test/runner
Started
…E…
Finished in 0.768623 seconds.

  1. Error:
    test_inspect_is_smart_about_io_types(TestCSVFeatures):
    Encoding::ConverterNotFoundError: code converter not found (UTF8-MAC to
    ASCII-8BIT)
    /Users/yugui/src/ruby/mri/test/csv/test_features.rb:233:in block in test_inspect_is_smart_about_io_types' /Users/yugui/src/ruby/mri/test/csv/test_features.rb:233:intest_inspect_is_smart_about_io_types’

134 tests, 1886 assertions, 0 failures, 1 errors, 0 skips
make: *** [test-all] Error 1

[Redirected to ruby-core so that James can also read this.]

Hello James,

This is an error report from Yugui about a csv test
failing on a Mac.

The reason for the failure is line 498 in lib/csv.rb,
in method CSV#inspect. This line reads:

str.map { |s| s.encode(“ASCII-8BIT”) }.join

The reason for the failure is that currently, filenames on a Mac
are labeled as being in an “encoding” of UTF8-MAC. The label
UTF8-MAC is used to mark the assumption that this string is in a
character normalization form particular to the Mac (mostly NFD,
but not for Korean, and not for CJKV compatibility ideographs,
as far as I understand).

There is in general no knowledge about character normalization with
respect to strings labeled UTF-8 (and even for UTF8-MAC, there is
no guarantee about character normalization at all). In my personal
view, the value of UTF8-MAC is questionable at least at the current
point in time where we do not handle character normalization in
any particular way. But for the current bug, that’s actually a side
issue. We might be able to fix this by introducing a (dummy) conversion
from UTF8-MAC to UTF-8, but that won’t actually fix the real problem.

The real problem is that the line above ignores that conversion
to ASCII-8BIT only works for US-ASCII characters, but not for
all the other characters that might appear e.g. in a filename.
The easiest fix for this, which is probably what was intended,
is to change the above line to

str.map { |s| s.force_encoding(“ASCII-8BIT”) }.join

A slightly more “user-friendly” fix is to change this to
something like:

begin
str.join
rescue
str.map { |s| s.force_encoding(“ASCII-8BIT”) }.join
end

This will only do a force_encoding if the encodings can’t
be joined as is.

[The code above hasn’t been tested; I don’t have access to a Mac.]

Hope this helps. Regards, Martin.

At 12:27 08/12/25, Yugui (Yuki S.) wrote:

nil
test_inspect_is_smart_about_io_types’
/Users/yugui/src/ruby/mri/test/csv/test_features.rb:233:in
`test_inspect_is_smart_about_io_types’

134 tests, 1886 assertions, 0 failures, 1 errors, 0 skips
make: *** [test-all] Error 1


Yugui [email protected]
http://yugui.jp
e$B;d$O;d$re(BDumpe$B$9$ke(B

#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:[email protected]

On Dec 25, 2008, at 12:02 AM, Martin D. wrote:

[Redirected to ruby-core so that James can also read this.]

Thanks for the breakdown.

It’s Christmas day here, so I’m pretty busy. I’ll address this ASAP
though.

James Edward G. II

e$B@.@%$G$9!#e(B

Yugui (Yuki S.) wrote:

Yuguie$B$G$9!#e(B

r20905e$B$G!"e(Bcsve$B$N%F%9%H$,<:GT$9$k$h$&$K$J$j$^$7$?!#e(Bpathe$B$N%(%s%3!<%G%#%s%0e(B
e$B$,e(BUTF8-MACe$B$KJQ$o$C$?$?$a!"e(BCSV#inspecte$BFb$Ge(BString#encodee$B$K<:GT$7$F$$$k$h$&e(B
e$B$G$9!#e(B
e$B$3$l$O$I$&07$C$?$i$h$$$b$N$G$7$g$&$+!#e(B

e$BD>46E*$K$Oe(B String#encode(“ASCII-8BIT”) e$B$O!"e(B
String#force_encode(“ASCII-8BIT”) e$B$HF1$88z2L$K$J$k$Y$-$K46$8$^$9!#e(B

e$B%Q%C%A$Oe(B Encoding::Converter e$B$,Mm$`$N$G>/$7$+$+$kM=Dj$G$9!#e(B

In article [email protected],
“NARUSE, Yui” [email protected] writes:

e$BD>46E*$K$Oe(B String#encode(“ASCII-8BIT”) e$B$O!"e(B
String#force_encode(“ASCII-8BIT”) e$B$HF1$88z2L$K$J$k$Y$-$K46$8$^$9!#e(B

e$B$“$^$jD>46E*$K;W$($^$;$s!#e(Bencode e$B$OJ8;z$rJ]B8$9$k$h$&$K%P%$e(B
e$B%HNs$rJQ49$9$k$O$:$J$N$K!”$=$&$J$C$F$$$^$;$s!#e(B

CSV#inspect e$B$r$_$k$H!“e(BASCII e$B8_49$Ne(B encoding
e$B$K$7$?$$!”$H$$$&e(B
e$B0U?^$r46$k$s$G$9$,!"0c$&$s$G$7$g$&$+!#e(BUTF-16 e$B$,Mh$?$H$-$NBPe(B
e$B:v$H$$$&$+!#e(B

UTF-16 e$B$r9M$($k$H!"e(Bforce_encoding e$B$K$9$k$H!"Cf?H$,J8;z$H$7$Fe(B
ASCII e$B$NHO0OFb$G$be(B \0 e$B$,$R$H$D$*$-$KF~$C$F4r$7$/$J$$$s$8$c$Je(B
e$B$$$G$7$g$&$+!#e(B

UTF-16 e$B$K$D$$$F$N5DO@$,$I$&$J$C$?$+$A$c$s$H3P$($F$J$$$s$G$9e(B
e$B$,!"$b$7e(B UTF-16 e$B$O07$o$J$$$G$b$$$$$H$$$&OC$@$C$?$i!"C1=c$Ke(B
.encode(“ASCII-8BIT”) e$B$r>C$7$F$7$^$&$H$$$&$N$O$I$&$G$7$g$&$+!#e(B

e$B$^$?!“e(BUTF-16 e$B$r07$&$N$G$”$l$P!"e(BUTF-16 e$B$KBP1~$9$ke(B ASCII
e$B8_49e(B
e$B$Je(B encoding e$B$KJQ49$9$k$H$$$&$3$H$G!"e(B

e = Encoding::Converter.asciicompat_encoding(s.encoding)
e ? s.encode(e) : s.force_encoding(“ASCII-8BIT”)

e$B$H$+$O$I$&$G$7$g$&!#e(B

At 03:11 08/12/26, NARUSE, Yui wrote:

e$BD>46E*$K$Oe(B String#encode(“ASCII-8BIT”) e$B$O!"e(B
String#force_encode(“ASCII-8BIT”) e$B$HF1$88z2L$K$J$k$Y$-$K46$8$^$9!#e(B

e$B$=$&$$$&D>46$O?'!9$"$j$=$&$G!"J,$+$i$J$$Mh$b$7$^$;$s$,!“e(B
e$B$3$3$O:,K\E*$K4V0c$C$F$$$k$N$G$O$J$$$+$H;W$$$^$9!#e(B
String#encode e$B$O$”$/$^$G$bJ8;z$r$-$A$s$HJQ49$7!"JQ49$G$-$Je(B
e$B$+$C$?$iFC<l$J=hM}$r$9$k$3$H$K$J$C$F$$$^$9!#e(B

e$B$`$d$_$K%3!<%IJQ49$H%P%$%HNs$N2r<a$NJQ99$r$4$C$A$c4V@%$K$7$J$$J}$,e(B
e$B650i$NLL$+$i9M$($F$bF@:v$+$H;W$$$^$9!#@.@%$5$s$d;d$J$I$O$3$NJUe(B
e$B$h$/J,$+$C$F$$$^$9$N$G!"?4G[$7$J$/$F$$$$$G$9$,!"9q:]2=!“J8;z%3!<%Ie(B
e$B$N>$7$/$J$$%W%m%0%i%^$O$3$&$$$&FCNc$r:n$k$H$I$s$I$sJ,$+$i$J$/$J$ke(B
e$B$*$=$l$,$”$j$^$9!#e(B

e$B$h$m$7$/$*4j$$$7$^$9!#e(B Martin.

([ruby-core:20862] e$B;29Me(B)

e$B%Q%C%A$Oe(B Encoding::Converter e$B$,Mm$`$N$G>/$7$+$+$kM=Dj$G$9!#e(B


NARUSE, Yui [email protected]

#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:[email protected]

e$B$^$D$b$He(B e$B$f$-$R$m$G$9e(B

e$B0J2<$NFbMF$O$J$s$H$+$7$Fe(BJames
Graye$B$KEA$($?J}$,$h$$$H;W$$$^$9!#e(B

In message “Re: [ruby-dev:37603] Re: [BUG:trunk] [m17n] TestCSVFeatures
fails because of r20905”
on Fri, 26 Dec 2008 13:13:29 +0900, Tanaka A. [email protected]
writes:

|CSV#inspect e$B$r$_$k$H!“e(BASCII e$B8_49$Ne(B encoding e$B$K$7$?$$!”$H$$$&e(B
|e$B0U?^$r46$k$s$G$9$,!"0c$&$s$G$7$g$&$+!#e(BUTF-16 e$B$,Mh$?$H$-$NBPe(B
|e$B:v$H$$$&$+!#e(B
|
|UTF-16 e$B$r9M$($k$H!"e(Bforce_encoding e$B$K$9$k$H!“Cf?H$,J8;z$H$7$Fe(B
|ASCII e$B$NHO0OFb$G$be(B \0 e$B$,$R$H$D$*$-$KF~$C$F4r$7$/$J$$$s$8$c$Je(B
|e$B$$$G$7$g$&$+!#e(B
|
|UTF-16 e$B$K$D$$$F$N5DO@$,$I$&$J$C$?$+$A$c$s$H3P$($F$J$$$s$G$9e(B
|e$B$,!”$b$7e(B UTF-16 e$B$O07$o$J$$$G$b$$$$$H$$$&OC$@$C$?$i!"C1=c$Ke(B
|.encode(“ASCII-8BIT”) e$B$r>C$7$F$7$^$&$H$$$&$N$O$I$&$G$7$g$&$+!#e(B
|
|e$B$^$?!“e(BUTF-16 e$B$r07$&$N$G$”$l$P!"e(BUTF-16 e$B$KBP1~$9$ke(B ASCII e$B8_49e(B
|e$B$Je(B encoding e$B$KJQ49$9$k$H$$$&$3$H$G!"e(B
|
| e = Encoding::Converter.asciicompat_encoding(s.encoding)
| e ? s.encode(e) : s.force_encoding(“ASCII-8BIT”)
|

e$B$H$+$O$I$&$G$7$g$&!#e(B
[e$BEDCfe(B e$BE/e(B][e$B$?$J$+e(B e$B$"$-$ie(B][Tanaka A.]

On Dec 25, 2008, at 12:02 AM, Martin D. wrote:

rescue
str.map { |s| s.force_encoding(“ASCII-8BIT”) }.join
end

This will only do a force_encoding if the encodings can’t
be joined as is.

On Dec 26, 2008, at 4:22 AM, Martin D. wrote:

What Akira proposes is to use

e = Encoding::Converter.asciicompat_encoding(s.encoding)
e ? s.encode(e) : s.force_encoding(“ASCII-8BIT”)

i.e. to convert to an ASCII-compatible encoding from
the current encoding if necessary and possible, otherwise
to force the data to be interpreted as ASCII-8BIT.

I’ve combined these two suggestions for now. If we come up with a
best practice for inspect() messages though, I would love to hear it.

James Edward G. II

e$B%A%1%C%He(B #927 e$B$,99?7$5$l$^$7$?!#e(B (by James G.)

e$B%9%F!<%?%9e(B Opene$B$+$ie(BClosede$B$KJQ99e(B
e$B?JD=e(B % 0e$B$+$ie(B100e$B$KJQ99e(B

Applied in changeset r21074.

http://redmine.ruby-lang.org/issues/show/927