Shift_JIS variants and UTF-16 support

e$B$3$s$K$A$O!"$J$+$`$ie(B(e$B$&e(B)e$B$G$9!#e(B

transcodee$B$He(BEncodinge$B$K$D$$$F<ALd$G$9!#e(B

(1)
e$B8=>u$Ne(Btranscodee$B$,%5%]!<%H$7$F$$$k!Ve(BShift_JISe$B!W$H$$$&$N$O!“e(B
e$B?tB?$”$ke(Bvariante$B$N$&$A$N$I$l$G$7$g$&$+e(B?
e$BG/Kv$Ke(BWindows-31Je$B$rF~$l$?$N$G!"%5%]!<%H$7$?$$$H$$$&$+$7$Fe(B
e$B$[$7$$$H$$$&$+$=$&;W$C$F$$$k$N$G$9$,e(B…

(2)
Encodinge$B$$h$Se(Btranscodee$B$Ge(BUTF-16e$B$r%5%]!<%H$7$?$$e(B/e$B$7$F$[$7e(B
e$B$$$N$G$9$,!“2?$r$I$&$9$l$P$$$$$G$7$g$&$+e(B?
enc/unicode.ce$B$H$$$&$N$,4{$K$”$j!"54<V$be(BUTF-16e$B$r%5%]!<%H$7e(B
e$B$F$?$H;W$&$N$Ge(BEncodinge$B$NJ}$O$
$=$i$/$"$^$j6lO+$7$J$$$G$Ge(B
e$B$-$k$s$8$c$J$$$+$H$OA[A|$7$F$$$^$9!#e(B

e$B$=$l$G$O!#e(B

e$BCfB<$5$s!"$3$s$K$A$O!#e(B

e$B$4<ALd$I$&$b$"$j$,$H$&$4$6$$$^$7$?!#e(B

At 10:25 08/01/07, U.Nakamura wrote:

e$B$3$s$K$A$O!"$J$+$`$ie(B(e$B$&e(B)e$B$G$9!#e(B

transcodee$B$He(BEncodinge$B$K$D$$$F<ALd$G$9!#e(B

(1) e$B8=>u$Ne(Btranscodee$B$,%5%]!<%H$7$F$$$k!Ve(BShift_JISe$B!W$H$$$&$N$O!“e(B
e$B?tB?$”$ke(Bvariante$B$N$&$A$N$I$l$G$7$g$&$+e(B?

e$BJQ49%F!<%V%k$K$Oe(B nkf e$B$Ne(B
http://nkf.sourceforge.jp/ucm/SJIS-nkf.ucm
e$B$r;H$C$F$$$^$9!#L^O@$3$A$i$G%P%0$,F~$C$F$7$^$C$?2DG=@-$b$“$j$^$9!#e(B
e$B2?$+ITET9g$J$3$H$,$”$j$^$7$?$i@'Hs65$($F$/$@$5$$!#e(B

e$BG/Kv$Ke(BWindows-31Je$B$rF~$l$?$N$G!"%5%]!<%H$7$?$$$H$$$&$+$7$Fe(B
e$B$[$7$$$H$$$&$+$=$&;W$C$F$$$k$N$G$9$,e(B…

e$B$3$A$i$b$=$&;W$C$F$$$^$9!#4{$Ke(B CP932
e$B$N%G!<%?$r?'!9$$$8$C$F$$$^$9e(B
(http://nkf.sourceforge.jp/ucm/cp932.ucm e$B$r%Y!<%9$K$7$Fe(B)e$B!#e(B

e$BLdBj$,!“$$$8$k4V$K!”%a!<%j%s%0%j%9%H$N5DO@$H%F!<%V%k$N%G!<%?e(B
e$B$O$I$&$b0lCW$7$F$J$$$3$H$K5$$,$D$-$^$7$?!#%a!<%j%s%0%j%9%H$Ne(B
e$B5DO@$G$Oe(B Shift_JIS e$B$Oe(B JIS
e$B$,Dj5A$7$F$$$k=c?h$J$b$N$@$H$$$&e(B
e$B0U8+$,6/$+$C$?$+$H;W$$$^$7$?$,!"e(BSJIS-nkf.ucm e$B$G$O>/$J$/$H$be(B
e$B0lIt$N%Y%s%@!<30;z$,4^$^$l$F$$$k!#$=$NJU$j$rAjCL$7$?$$$H$3$m$Ge(B
e$BCfB<$5$s$N%a!<%k$,$A$g$&$I$$$$$H$-$KMh$^$7$?$N$G@'Hs0U8+$re(B
e$B$*J9$+$;$/$@$5$$!#e(B

(2) Encodinge$B$$h$Se(Btranscodee$B$Ge(BUTF-16e$B$r%5%]!<%H$7$?$$e(B/e$B$7$F$[$7e(B
e$B$$$N$G$9$,!“2?$r$I$&$9$l$P$$$$$G$7$g$&$+e(B?
enc/unicode.ce$B$H$$$&$N$,4{$K$”$j!"54<V$be(BUTF-16e$B$r%5%]!<%H$7e(B
e$B$F$?$H;W$&$N$Ge(BEncodinge$B$NJ}$O$
$=$i$/$"$^$j6lO+$7$J$$$G$Ge(B
e$B$-$k$s$8$c$J$$$+$H$OA[A|$7$F$$$^$9!#e(B

ruby-core e$B$K$O@hF|$K$=$l$K$D$$$F$N5DO@$,$"$j$^$7$?$N$G!"e(B
e$B@'Hs;29M$K$7$F$/$@$5$$!#e(B14729
e$B$+$iAL$C$F8+$?J}$,$$$$$+$b$7$l$^$;$s!#e(B
e$B$=$NOC$O$3$3$N%a!<%j%s%0%j%9%H$K;}$C$F$-$F$b$$$$$G$9!#e(B

transcode e$B$G$O$=$N$&$A<BAu$5$l$kM=Dj$J$N$G!"e(BString#encoding
e$B$+$ie(B ‘UTF-16’
e$B$H$+JV$C$F$/$k$3$H$K$O$J$k$@$m$&$,!"54<V$G%5%]!<%He(B
e$B$5$l$F$b!"e(BRuby e$B$Ne(B Encoding
e$B$H$7$F%U%k$K$5$C%]!<%H$5$l$kM=Dj$,e(B
e$BL5$$$N$O$^$D$b$H$5$s$N0U8+$G$9!#$9$J$o$A!"e(B
“e$BCfB<e(B”.transcode(‘UTF-16’).length e$B8+$?$J$b$N$Oe(B 2
e$B$N$G$O$J$/e(B
4 e$B$rJV$9Ey!"e(BUTF-16 e$B$OFbIte(B Encoding
e$B$H$7$F$O;H$($J$$$H$$$&$3$H$@$H;W$$$^$9!#e(B

e$B59$7$/$*4j$$$7$^$9!#e(B Martin.

e$B$=$l$G$O!#e(B

U.Nakamura [email protected]

#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:[email protected]

e$B$^$D$b$He(B e$B$f$-$R$m$G$9e(B

In message “Re: [ruby-dev:32953] Re: Shift_JIS variants and UTF-16
support”
on Mon, 7 Jan 2008 18:20:23 +0900, Martin D.
[email protected] writes:

|transcode e$B$G$O$=$N$&$A<BAu$5$l$kM=Dj$J$N$G!"e(BString#encoding
|e$B$+$ie(B ‘UTF-16’ e$B$H$+JV$C$F$/$k$3$H$K$O$J$k$@$m$&$,!"54<V$G%5%]!<%He(B
|e$B$5$l$F$b!"e(BRuby e$B$Ne(B Encoding e$B$H$7$F%U%k$K$5$C%]!<%H$5$l$kM=Dj$,e(B
|e$BL5$$$N$O$^$D$b$H$5$s$N0U8+$G$9!#e(B

e$B$$=$i$/;d$N%a!<%k$N!Ve(Bsecond class citizene$B!W$H$$$&8@MU$+$ie(B
e$B!V%U%k$K%5%]!<%H$5$l$kM=Dj$,$J$$!W$H;W$o$l$?$N$@$H;W$$$^$9!#e(B
e$B$3$Ne(Bsecond class
citizene$B$H$$$&I=8=$N0U?^$O!"!V%j%F%i%k$,$J$$!We(B
e$B$H$$$&0UL#$G$9!#8=;~E@$G$O%9%/%j%W%H%(%s%3!<%G%#%s%0$H$7$Fe(B
bytee$BNsE
$Ke(BASCIIe$B%3%s%Q%A%V%k$G$J$$%(%s%3!<%G%#%s%0$KBP1~$9$k$Ne(B
e$B$OBgJQ$J$N$G!"EvLL$OBP1~$7$J$$$G$*$3$&$H;W$C$F$$$^$9!#e(B

e$B$G$9$+$ie(B

coding: UTF-16BE

e$B$N$h$&$JI=8=$H$+!“e(BBOMe$B$K$h$k;XDj$H$+$O5v$9M=Dj$O$”$j$^$;$s!#e(B
e$B$G$b!"C/$+$,%Q%C%A$r=q$$$F$/$l$?$j$9$k$H$$$-$J$j$R$C$/$jJV$Ce(B
e$B$?$j$9$k$o$1$G$9$,!#e(B

|e$B$9$J$o$A!"e(B
|“e$BCfB<e(B”.transcode(‘UTF-16’).length e$B8+$?$J$b$N$Oe(B 2 e$B$N$G$O$J$/e(B
|4 e$B$rJV$9Ey!"e(BUTF-16 e$B$OFbIte(B Encoding e$B$H$7$F$O;H$($J$$$H$$$&$3$H$@$H;W$$$^$9!#e(B

e$B$H$O$$$(!“e(BUTF-16e$B$K%U%k%5%]!<%H$7$J$$$H$$$&$N$O!“e(BUTF-16e$B%(%s%3!<e(B
e$B%G%#%s%0$O%@%!<$H$9$k$o$1$G$O$“$j$^$;$s!#54<V$,e(BUTF-16e$B%(%s%3!<e(B
e$B%G%#%s%0$KBP1~$7$F$$$k$N$G!”%@%
!<$K$9$kM}M3$O$”$^$j$J$$$+$Je(B
e$B$”$H;W$$$^$9!#$N$G!"e(B

“e$BCfB<e(B”.transcode(‘UTF-16LE’).length

e$B$Oe(B2e$B$rJV$9$h$&$K$G$-$k$H;W$$$^$9!#%(%s%G%#%“%s$,;XDj$5$l$F$$e(B
e$B$J$$e(B"UTF-16"e$B$r;H$($k$h$&$K$9$k$+$I$&$+$O7h$a$F$$$^$;$s$,!”$*e(B
e$B$=$i$/$O%W%i%C%H%U%)!<%`$N%(%s%G%#%"%s$r8+$F!"JLL>$H$7$FDj5Ae(B
e$B$9$k$N$G$O$J$$$+$H;W$$$^$9!#e(B

                            e$B$^$D$b$He(B e$B$f$-$R$me(B /:|)

e$B@.@%$G$9!#e(B

Yukihiro M. wrote:

e$B%(%s%G%#%“%s$,;XDj$5$l$F$$e(B
e$B$J$$e(B"UTF-16"e$B$r;H$($k$h$&$K$9$k$+$I$&$+$O7h$a$F$$$^$;$s$,!”$*e(B
e$B$=$i$/$O%W%i%C%H%U%)!<%`$N%(%s%G%#%"%s$r8+$F!"JLL>$H$7$FDj5Ae(B
e$B$9$k$N$G$O$J$$$+$H;W$$$^$9!#e(B

RFC 2781 e$B$Ne(B 4.3 Interpreting text labelled as UTF-16 e$B$G$O!“e(B
BOM e$B$r8+$FH=JLe(B e$B”*e(B BOM e$B$,$J$+$C$?$ie(B big endian
e$B$H2r<a$9$k!#e(B(SHOULD)
e$B$H$J$C$F$$$^$9!#e(B
http://www.ietf.org/rfc/rfc2781.txt

e$B@.@%$G$9!#e(B

Martin D. wrote:

e$B0U8+$,6/$+$C$?$+$H;W$$$^$7$?$,!"e(BSJIS-nkf.ucm e$B$G$O>/$J$/$H$be(B
e$B0lIt$N%Y%s%@!<30;z$,4^$^$l$F$$$k!#$=$NJU$j$rAjCL$7$?$$$H$3$m$Ge(B
e$BCfB<$5$s$N%a!<%k$,$A$g$&$I$$$$$H$-$KMh$^$7$?$N$G@'Hs0U8+$re(B
e$B$*J9$+$;$/$@$5$$!#e(B

Perl/Encode e$B$Ne(B Shift_JIS
e$B$K$"$o$;$?J}$,$$$$$h$&$J5$$b$7$?$N$G$9$,!"FCJLe(B
e$BLdBj$b$J$$$H;W$C$?$N$Ge(B SJIS-nkf
e$B$G$b$$$$$+$J$H;W$C$F$$$^$7$?!#e(B

e$B$3$N5!2q$Ke(B SJIS-nkf e$B$K$D$$$F@bL@$7$F$*$-$^$9$H!“e(BSJIS-nkf
e$B$O$+$J$j%k!<%:e(B
e$B$J%^%C%T%s%0$G!”%Y%s%@!<0MB8J8;z$N%^%C%T%s%0$,BgNL$KF~$C$F$$$^$9!#$3$3$Ke(B
e$B$D$$$F$Oe(B nkf e$B$N8+2r$H$7$F$O!V87L)@-$r5a$a$k$h$&$Je(B encoding
e$B$G$b$J$$$7!"e(B
e$BA}$($kJ,$K$OJXMx$@$+$i$$$$$8$c$s!W$H$$$&$b$N$J$N$G$9$,!“e(BRuby/transcode
e$B$NJ}?K$OJL$K$9$k$N$b$”$j$G$7$g$&!#e(B

e$B$J$*!"e(BPerl/Encode e$B$Ne(B Shift_JIS
e$B$H$N%^%C%T%s%0$N0c$$$N$&$A!“C1=c$JDI2C$Ge(B
e$B$J$$$b$N$O0J2<$NDL$j$G!”$3$l$i$K$D$$$F$Oe(B Perl/Encode e$B$h$j$be(B
JIS e$B$N8x3+$7e(B
e$B$F$$$k%^%C%T%s%04s$j$KE]$7$F$$$^$9!#e(B

ucm\shiftjis.ucm e$B$He(B ucm\SJIS-nkf.ucm e$B$rHf3SCfe(B…
143c146,147
< \x81\xCA |0

\x81\xCA |1
\x81\x50 |1
265c270,271
< \x81\x5C |0


\x81\x5C |0
\x81\x5C |1
7079,7080c7544,7549
< \x81\x50 |0
< \x81\x8F |0


\x81\x50 |1
\x81\x8F |1
e$B%U%!%$%kHf3S=*N;e(B.

e$B$D$$$G$K=q$$$F$$-$^$9$H!"e(BEUC-nkf e$B$Oe(B nkf e$B%G%U%)%k%H$Ne(B
EUC-JPe$B$G!"4pK\E
$Ke(B
e$B$Oe(B eucJP-ascii e$B$J$N$G$9$,!"e(B3bytes e$B$K$J$C$F$7$^$&HO0O$Oe(B
CP51932 e$B$re(B
e$BMQ$$$F$$$kJQB’E*$Je(B encoding e$B$G$9!#F~NO$K4X$7$F$Oe(B 3bytes
e$B$b<u$1<h$j$^$9!#e(B
http://nkf.sourceforge.jp/ucm/eucJP-nkf.ucm

e$B$b$7$+$7$?$ie(B Ruby e$B$G$Oe(B eucJP-ascii
e$B$NJ}$r!Ve(BEUC-JPe$B!W$G;2>H$7$?J}$,$$$$$+e(B
e$B$b$7$l$^$;$s!#e(B
http://nkf.sourceforge.jp/ucm/eucJP-ascii.ucm
http://home.m05.itscom.net/numa/cde/ucs-conv/ucs-conv.html

eucJP-ascii TOG/JVC CDE/Motif e$B5;=Q8!F$e(B WG

e$B$NDj$a$?!“e(BeucJP-open e$B$He(B
Unicode e$B4V$N%3!<%IJQ495,B’$G!”!Ve(BJIS X 0221 e$B<0$NJQ49e(B (ASCII
e$B$HJ;MQ$9$k>le(B
e$B9ge(B)e$B!W!#e(B

e$B$3$N$h$&$JHyL/$5$,$G$-$F$$$k$N$O!“$=$b$=$be(B JIS
e$B$N8x3+$7$F$$$ke(B Shift_JIS
e$B$He(B EUC-JP
e$B$NJQ49%^%C%W$,1_5-9fLdBj$N$;$$$G@dK>E*$K;H$($J$$$?$a$K!”$I$&$7e(B
e$B$F$b$=$N$^$^MQ$$$k$3$H$,$G$-$J$$$+$i$G$9!#$=$l$rF’$^$($?>e$G$I$N$h$&$JJQe(B
e$B49I=$r!Ve(BShift_JISe$B!W$H!Ve(BEUC-JPe$B!W$K3d$jEv$F$k$+$OF,$NDK$$LdBj$G$9!#e(B

e$BG/Kv$Ke(BWindows-31Je$B$rF~$l$?$N$G!"%5%]!<%H$7$?$$$H$$$&$+$7$Fe(B
e$B$[$7$$$H$$$&$+$=$&;W$C$F$$$k$N$G$9$,e(B…

e$B$3$A$i$b$=$&;W$C$F$$$^$9!#4{$Ke(B CP932 e$B$N%G!<%?$r?'!9$$$8$C$F$$$^$9e(B
(http://nkf.sourceforge.jp/ucm/cp932.ucm e$B$r%Y!<%9$K$7$Fe(B)e$B!#e(B

CP932 e$B$N%F!<%V%k$Oe(B Perl/Encode e$B$H0lCW$7$F$$$?$+!“e(BWindows
e$B$,<B:]$K9T$&JQe(B
e$B49$rD4$Y$F$=$l$r4p$K:n$C$?$+$N$O$:$J$N$G!”$^$:LdBj$O$J$$$O$:$G$9!#e(B

In article [email protected],
Yukihiro M. [email protected] writes:

= open(read)e$B$N%b!<%I$H$7$Fe(B
BOMe$B$r8+$FH=JLe(B
e$B"*e(B UTF-16e$B$@$1@hF,e(B2e$B%P%$%H$rFI$_9~$`%k!<%A%s$rDI2C$9e(B
e$B$kI,MW$"$je(B
e$B@5D>$a$s$I$/$5$$e(B

e$BCf?H$r8+$FH=JL$H$$$&4XO"$G$$$&$H!“e(Bmagic comment e$B$r8!=P$7$FH=e(B
e$BJL$7$F$[$7$$$H$$$&$3$H$,$”$j$^$9!#$3$l$O6qBNE*$K$Oe(B rdoc e$B$GI,e(B
e$BMW$K$J$j$^$7$?!#:#$OE,Ev$K=q$$$F$“$j$^$9$,!”%i%$%V%i%j2=$9$ke(B
e$B$N$O$"$jF@$kOC$G$7$g$&!#e(B

e$B$G!“$=$&$$$&%i%$%V%i%j$,e(B BOM e$B$b8+$k$H$$$&$N$O$”$jF@$k$s$8$ce(B
e$B$J$$$G$7$g$&$+!#e(B

e$B$^$D$b$He(B e$B$f$-$R$m$G$9e(B

In message “Re: [ruby-dev:32959] Re: Shift_JIS variants and UTF-16
support”
on Mon, 7 Jan 2008 19:18:44 +0900, “NARUSE, Yui”
[email protected] writes:

|Yukihiro M. wrote:
|> e$B%(%s%G%#%“%s$,;XDj$5$l$F$$e(B
|> e$B$J$$e(B"UTF-16"e$B$r;H$($k$h$&$K$9$k$+$I$&$+$O7h$a$F$$$^$;$s$,!”$*e(B
|> e$B$=$i$/$O%W%i%C%H%U%)!<%`$N%(%s%G%#%"%s$r8+$F!"JLL>$H$7$FDj5Ae(B
|> e$B$9$k$N$G$O$J$$$+$H;W$$$^$9!#e(B
|
|RFC 2781 e$B$Ne(B 4.3 Interpreting text labelled as UTF-16 e$B$G$O!“e(B
|BOM e$B$r8+$FH=JLe(B e$B”*e(B BOM e$B$,$J$+$C$?$ie(B big endian e$B$H2r<a$9$k!#e(B(SHOULD)
|e$B$H$J$C$F$$$^$9!#e(B
|http://www.ietf.org/rfc/rfc2781.txt

e$B%$%^%I%-$O$=$&$$$&$b$N$J$N$G$9$M!#;d$N<c$$:“$O%(%s%G%#%”%s$,e(B
e$B;XDj$7$F$$$J$$>l9g$O%M!<%F%#%V%(%s%G%#%“%s$G$”$k$H8+$J$9$H8@e(B
e$B$o$l$F$$$?$h$&$K;W$&$N$G$9$,e(B(e$B[#Kf$J5-21e(B)e$B!#e(B

e$B$^$“!”$=$l$O$=$l$H$7$F!"e(B

|RFC 2781 e$B$Ne(B 4.3 Interpreting text labelled as UTF-16 e$B$G$O!“e(B
|BOM e$B$r8+$FH=JLe(B e$B”*e(B BOM e$B$,$J$+$C$?$ie(B big endian e$B$H2r<a$9$k!#e(B(SHOULD)
|e$B$H$J$C$F$$$^$9!#e(B

e$B$H$$$&%k!<%k$K=>$&$H$7$F!“$=$l$r:#$Ne(BRuby
M17Ne$B$N1dD9@~>e$G!”%(e(B
e$B%s%3!<%G%#%s%0L>$K;XDj$9$k$H$$$&$N$,$I$&$$$&$3$H$K$J$k$+9M$(e(B
e$B$F$_$k$He(B

= open(read)e$B$N%b!<%I$H$7$Fe(B
BOMe$B$r8+$FH=JLe(B
e$B"*e(B
UTF-16e$B$@$1@hF,e(B2e$B%P%$%H$rFI$_9~$`%k!<%A%s$rDI2C$9e(B
e$B$kI,MW$"$je(B
e$B@5D>$a$s$I$/$5$$e(B

= open(write)e$B$N%b!<%I$H$7$Fe(B
e$BJ8;zNs$R$H$D$R$H$D$Ke(BBOMe$B$,IU$$$F$$$k$H$O9M$($K$/$$e(B
e$B"*e(B writee$B$G$O%(%i!<$H$$$&$3$H!)e(B
e$BFCJL07$$$O$a$s$I$/$5$$e(B

= String#encodee$B$N0z?t$H$7$Fe(B
e$BJ8;zNs0l$D0l$D$Ke(BBOMe$B$O$J$$$@$m$&!#e(B
e$B"*e(B encodee$B$G$O%(%i!<$H$$$&$3$H!)e(B
e$BFCJL07$$$O$a$s$I$/$5$$e(B

e$B$H$$$&$3$H$G!“$”$^$j8=<BE*$G$O$J$$$h$&$G$9!#$G$“$l$P!”$3$N%k!<e(B
e$B%k$rB:=E$9$k$N$G$"$l$Pe(B

e$B$=$b$=$b%(%s%G%#%"%s$J$7$Ne(BUTF-16e$B$H$$$&%(%s%3!<%G%#%s%0$O5v$5$J$$e(B

e$B$H$$$&$N$,M#0l$N8=<BE*$J2r$N$h$&$J!#e(B

e$B0lJ}!“%(%s%G%#%”%s$r>JN,$7$?>l9g$K$O%M!<%F%#%V%(%s%G%#%“%s$re(B
e$B:NMQ$9$k$H$$$&%k!<%k$G$”$l$P!“>e5-$NLdBj$OH/@8$7$^$;$s$,!“L5e(B
e$BMQ$K%M!<%F%#%V%(%s%G%#%”%s$Je(BUTF-16e$B$r?d>)$7$F$7$^$&$H$$$&LdBje(B
e$B$,$”$k$H$$$($P$"$j$^$9$M!#e(B

                            e$B$^$D$b$He(B e$B$f$-$R$me(B /:|)

e$B$3$s$K$A$O!"$J$+$`$ie(B(e$B$&e(B)e$B$G$9!#e(B

In message “[ruby-dev:32960] Re: Shift_JIS variants and UTF-16 support”
on Jan.07,2008 19:30:41, [email protected] wrote:
| |RFC 2781 e$B$Ne(B 4.3 Interpreting text labelled as UTF-16 e$B$G$O!“e(B
| |BOM e$B$r8+$FH=JLe(B e$B”*e(B BOM e$B$,$J$+$C$?$ie(B big endian e$B$H2r<a$9$k!#e(B(SHOULD)
| |e$B$H$J$C$F$$$^$9!#e(B
|
| e$B$H$$$&%k!<%k$K=>$&$H$7$F!“$=$l$r:#$Ne(BRuby M17Ne$B$N1dD9@~>e$G!”%(e(B
| e$B%s%3!<%G%#%s%0L>$K;XDj$9$k$H$$$&$N$,$I$&$$$&$3$H$K$J$k$+9M$(e(B
| e$B$F$_$k$He(B
|
| = open(read)e$B$N%b!<%I$H$7$Fe(B
| BOMe$B$r8+$FH=JLe(B
| e$B"*e(B UTF-16e$B$@$1@hF,e(B2e$B%P%$%H$rFI$_9~$`%k!<%A%s$rDI2C$9e(B
| e$B$kI,MW$"$je(B
| e$B@5D>$a$s$I$/$5$$e(B
|
| = open(write)e$B$N%b!<%I$H$7$Fe(B
| e$BJ8;zNs$R$H$D$R$H$D$Ke(BBOMe$B$,IU$$$F$$$k$H$O9M$($K$/$$e(B
| e$B"e(B writee$B$G$O%(%i!<$H$$$&$3$H!)e(B
| e$BFCJL07$$$O$a$s$I$/$5$$e(B
|
| = String#encodee$B$N0z?t$H$7$Fe(B
| e$BJ8;zNs0l$D0l$D$Ke(BBOMe$B$O$J$$$@$m$&!#e(B
| e$B"e(B encodee$B$G$O%(%i!<$H$$$&$3$H!)e(B
| e$BFCJL07$$$O$a$s$I$/$5$$e(B
|
| e$B$H$$$&$3$H$G!“$”$^$j8=<BE
$G$O$J$$$h$&$G$9!#$G$“$l$P!”$3$N%k!<e(B
| e$B%k$rB:=E$9$k$N$G$“$l$Pe(B
|
| e$B$=$b$=$b%(%s%G%#%”%s$J$7$Ne(BUTF-16e$B$H$$$&%(%s%3!<%G%#%s%0$O5v$5$J$$e(B
|
| e$B$H$$$&$N$,M#0l$N8=<BE
$J2r$N$h$&$J!#e(B

e$BFCJL07$$$O$a$s$I$/$5$$!"$H$$$&$N$O!V$a$s$I$/$5$/$J$$C/$+$,4@e(B
e$B$r$+$1$P$$$$$8$c$s!W$H$$$&$3$H$K$7$F9M$($k$H!"Nc$($P0J2<$N$he(B
e$B$&$J$b$N$O$I$&$G$7$g$&$+!#e(B

= e$BA0Dse(B
"UTF-16"e$B$H$$$&e(BEncodinge$B$OMQ0U$7$J$$!#e(B

= open(read)e$B$N%b!<%I$H$7$Fe(B
“utf-16"e$B$,;XDj$5$l$?>l9g$Oe(BBOMe$B$rFI$s$Ge(B"UTF-16BE"e$B$^$?$Oe(B"UTF-16LE”
e$B$I$A$i$Ne(BEncodinge$B$K$J$k!#e(B
BOMe$B$,$J$1$l$Pe(B"UTF-16BE"e$B!#e(B
“utf-16be"e$B$,;XDj$5$l$Fe(BBOMe$B$,$J$$$^$?$Oe(BBOMe$B$,e(BBEe$B$G$”$l$Pe(B"UTF-16BE"e$B!“e(B
BOMe$B$,e(BLEe$B$G$”$l$Pe(B… e$BNc30e(B?
“utf-16le"e$B$,;XDj$5$l$Fe(BBOMe$B$,$J$$$^$?$Oe(BBOMe$B$,e(BLEe$B$G$”$l$Pe(B"UTF-16LE"e$B!“e(B
BOMe$B$,e(BBEe$B$G$”$l$Pe(B… e$BNc30e(B?

= open(write)e$B$N%b!<%I$H$7$Fe(B
"utf-16"e$B$O<u$1IU$1$J$$e(B(InvalidArgument)

= String#encodee$B$N0z?t$H$7$Fe(B
"UTF-16"e$B$H$$$&e(BEncodinge$B$O$J$$$N$GLdBj$J$$!#e(B

e$B$=$l$G$O!#e(B

e$B@.@%$5$s!"$3$s$K$A$O!#e(B

At 19:18 08/01/07, NARUSE, Yui wrote:

e$B$H$J$C$F$$$^$9!#e(B
http://www.ietf.org/rfc/rfc2781.txt

e$B$3$l$O30It$+$ie(B (e$B%U%!%$%k$+$ie(B)
e$B$NFI$9~$$N$H$-$KBgJQBEEv$@$H;W$$$^$9$,!"e(B
e$BFbIt=hM}$N$?$a$N7h$^$j$G$O$J$/!"FbIt=hM}$K$O8~$$$F$J$$$H;W$$$^$9!#e(B

e$B59$7$/$*4j$$$7$^$9!#e(B Martin.

NARUSE, Yui [email protected]
DBDB A476 FDBD 9450 02CD 0EFC BCE3 C388 472E C1EA

#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:[email protected]

e$B@.@%$G$9!#e(B

U.Nakamura wrote:

BOMe$B$,$J$1$l$Pe(B"UTF-16BE"e$B!#e(B
“utf-16be"e$B$,;XDj$5$l$Fe(BBOMe$B$,$J$$$^$?$Oe(BBOMe$B$,e(BBEe$B$G$”$l$Pe(B"UTF-16BE"e$B!“e(B
BOMe$B$,e(BLEe$B$G$”$l$Pe(B… e$BNc30e(B?
“utf-16le"e$B$,;XDj$5$l$Fe(BBOMe$B$,$J$$$^$?$Oe(BBOMe$B$,e(BLEe$B$G$”$l$Pe(B"UTF-16LE"e$B!“e(B
BOMe$B$,e(BBEe$B$G$”$l$Pe(B… e$BNc30e(B?

e$B%P%$%H%*!<%@!<$H0[$J$ke(B BOM e$B$,=P8=$7$?$ie(B invalid mbstring
sequence e$B$G$7$ge(B
e$B$&$M!"e(BU+FFFE e$B$H$$$&J8;z$O$J$$$N$G!#e(B

= open(write)e$B$N%b!<%I$H$7$Fe(B
"utf-16"e$B$O<u$1IU$1$J$$e(B(InvalidArgument)

BOM e$B$r$D$1$F=PNO$9$k<jCJ$,M_$7$$5$$,$7$^$9!#e(B

= String#encodee$B$N0z?t$H$7$Fe(B
"UTF-16"e$B$H$$$&e(BEncodinge$B$O$J$$$N$GLdBj$J$$!#e(B

e$B$3$l$@$1$@$HLdBj$J$$$h$&$J5$$,$9$k$N$G$9$,!"=PNO;~$Ke(B BOM
e$B$,$D$1$i$l$k$3e(B
e$B$H$r4|BT$9$kJ8;zNs$NB8:_$r9M$($k$HG:$^$7$$$H$3$m$G$9!#e(B

e$B$3$s$K$A$O!"$J$+$`$ie(B(e$B$&e(B)e$B$G$9!#e(B

In message “[ruby-dev:32966] Re: Shift_JIS variants and UTF-16 support”
on Jan.07,2008 21:00:47, [email protected] wrote:
| > = open(write)e$B$N%b!<%I$H$7$Fe(B
| > "utf-16"e$B$O<u$1IU$1$J$$e(B(InvalidArgument)
|
| BOM e$B$r$D$1$F=PNO$9$k<jCJ$,M_$7$$5$$,$7$^$9!#e(B

“w:utf-16be” e$B$de(B “w:utf-16le”
e$B$OEvA3e(BBOMe$B$r%U%!%$%k@hF,$K=PNO$9e(B
e$B$k$H$$$&A[Dj$G$7$?!#e(B

"r+"e$B$de(B"w+“e$B!“e(B"a"e$B$J$I$O!”$?$V$s4{$Ke(BBOMe$B$,$”$k$H4|BT$7$F2?$b$7$Je(B
e$B$$$N$G$7$g$&!#e(Bopene$B$7$F$+$ie(Bset_encodinge$B$9$k>l9g$bF1MM!#e(B
… e$B$H$+$$$&$3$H$r9M$(=P$9$H$H$?$s$KOC$,Fq$7$/$J$j$^$9$M!#e(B
e$B2L$?$7$F$3$l$G$$$$$N$+$J!#e(B

| > = String#encodee$B$N0z?t$H$7$Fe(B
| > “UTF-16"e$B$H$$$&e(BEncodinge$B$O$J$$$N$GLdBj$J$$!#e(B
|
| e$B$3$l$@$1$@$HLdBj$J$$$h$&$J5$$,$9$k$N$G$9$,!”=PNO;~$Ke(B BOM e$B$,$D$1$i$l$k$3e(B
| e$B$H$r4|BT$9$kJ8;zNs$NB8:_$r9M$($k$HG:$^$7$$$H$3$m$G$9!#e(B

Stringe$B%%V%8%'%/%H$,!"L@<(E$K$=$&$$$&J8;zNsA`:n$r$d$i$J$$8Be(B
e$B$j$Oe(BBOMe$B$r5$$K$9$kI,MW$O$J$$e(B(e$B!V=PNO!W$r5$$K$9$k$N$Oe(BIOe$B$N;E;ve(B)e$B$He(B
e$B;W$&$N$G$9$,!"2?$+;d$,%]%$%s%H$r8+Mn$H$7$F$k$N$G$7$g$&$+!#e(B

e$B$=$l$G$O!#e(B

e$B@.@%$G$9!#e(B

Yukihiro M. wrote:

|RFC 2781 e$B$Ne(B 4.3 Interpreting text labelled as UTF-16 e$B$G$O!“e(B
|BOM e$B$r8+$FH=JLe(B e$B”*e(B BOM e$B$,$J$+$C$?$ie(B big endian e$B$H2r<a$9$k!#e(B(SHOULD)
|e$B$H$J$C$F$$$^$9!#e(B
|http://www.ietf.org/rfc/rfc2781.txt

e$B%$%^%I%-$O$=$&$$$&$b$N$J$N$G$9$M!#;d$N<c$$:"$O%(%s%G%#%"%s$,e(B
e$B;XDj$7$F$$$J$$>l9g$O%M!<%F%#%V%(%s%G%#%"%s$G$"$k$H8+$J$9$H8@e(B
e$B$o$l$F$$$?$h$&$K;W$&$N$G$9$,e(B(e$B[#Kf$J5-21e(B)e$B!#e(B

transformation format
e$B$G$9$+$i$M$’!#%M%C%H%o!<%/%P%$%H%*!<%@!<$O%S%C%0%(e(B
e$B%s%G%#%"%s$K$J$j$^$9$7!#e(B

    e$B"*e(B  UTF-16e$B$@$1@hF,e(B2e$B%P%$%H$rFI$_9~$`%k!<%A%s$rDI2C$9e(B
        e$B$kI,MW$"$je(B
        e$B@5D>$a$s$I$/$5$$e(B

BOM e$B$,$J$$>l9g$r%S%C%0%(%s%G%#%"%s$K$9$k$N$Oe(B SHOULD
e$B$J$N$G!“5U$i$C$F$b$$e(B
e$B$$$H$O;W$&$N$G$9$,!J$h$[$I$NM}M3$,$J$1$l$P=>$&$Y$-$G$7$g$&$,!K!”!Ve(BBOMe$B$re(B
e$B8+$m!W$K$Oe(B MUST
e$B$,$D$$$F$7$^$C$F$$$k$N$G!"!Ve(BUTF-16e$B!W$KBP1~$9$k$J$i$PI,?\e(B
e$B$+$J$!$H;W$$$^$9!#$A$J$_$K!"e(BPerl e$B$G$Oe(B BOM
e$B$,Mh$l$P$=$3$+$i?dB,!"e(BBOM e$B$,$Je(B
e$B$$$H%(%i!<$G$7$?!#e(B

e$BCGJRE*$J%Q%C%A$NNc$G$9$,!"B?J,$3$s$J46$8$K$J$k$G$7$g$&$+!#e(Bopen
e$BD>8e$K8+e(B
e$B$A$c$C$?J}$,$$$$$N$+$J$!!#e(B

Index: io.c

— io.c (revision 14917)
+++ io.c (working copy)
@@ -2246,6 +2246,20 @@ rb_io_getc(VALUE io)
if (io_fillbuf(fptr) < 0) {
return Qnil;
}
+

  • if (enc != rb_utf16_encoding()) {
  • } else if (fptr->rbuf_off >= 2) {
  •   if (fptr->rbuf[0] == 0xFE && fptr->rbuf[0] == 0xFF) {
    
  •       fptr->enc = enc = rb_utf16be_encoding()
    
  •   } else if (fptr->rbuf[0] == 0xFF && fptr->rbuf[0] == 0xFE) {
    
  •       fptr->enc = enc = rb_utf16le_encoding()
    
  •   } else {
    
  •       rb_raise(rb_eIOError, "%X %X is not valid BOM for UTF-16", 
    

);

  •   }
    
  • } else {
  •   rb_raise(rb_eIOError, "too short input for UTF-16", );
    
  • }
  • r = rb_enc_precise_mbclen(fptr->rbuf+fptr->rbuf_off,
    fptr->rbuf+fptr->rbuf_off+fptr->rbuf_len, enc);
    if ((n = MBCLEN_CHARFOUND®) != 0 && n <= fptr->rbuf_len) {
    str = rb_str_new(fptr->rbuf+fptr->rbuf_off, n);

= open(write)e$B$N%b!<%I$H$7$Fe(B
e$BJ8;zNs$R$H$D$R$H$D$Ke(BBOMe$B$,IU$$$F$$$k$H$O9M$($K$/$$e(B
e$B"*e(B writee$B$G$O%(%i!<$H$$$&$3$H!)e(B
e$BFCJL07$$$O$a$s$I$/$5$$e(B

open(write) e$B$N>l9g$O!"3+$$$F$+$i:G=i$Ne(B write e$B$NA0$Ke(B 0xFEFF
e$B$re(B print e$B$9$ke(B
e$B$@$1$G$$$$$h$&$J5$$b!#e(BPerle$B$G$Oe(B BOM e$BIU$-e(B big endian
e$B$G$7$?!#e(B

= String#encodee$B$N0z?t$H$7$Fe(B
e$BJ8;zNs0l$D0l$D$Ke(BBOMe$B$O$J$$$@$m$&!#e(B
e$B"*e(B encodee$B$G$O%(%i!<$H$$$&$3$H!)e(B
e$BFCJL07$$$O$a$s$I$/$5$$e(B

e$BFbIt$G;}$C$F$$$k4V$Oe(B BOM e$B$r$D$1$F$*$/I,MW$O$J$$$H;W$$$^$9!#e(B

e$B$,$"$k$H$$$($P$"$j$^$9$M!#e(B
e$B;W$&$K!"!Ve(BUTF-16e$B!W$N2ACM$OFI$9~$;~$Ke(B BOM
e$B$r8+$F%P%$%H%*!<%@!<$r?dB,$7e(B
e$B$F$/$l$kE@$@$H;W$&$N$G!"$=$l$r$7$J$$$J$i$P$$$C$=Hs%5%]!<%H$NJ}$,:.Mp$,>/e(B
e$B$J$$MM$K46$8$^$9!#e(B

e$B@.@%$G$9!#e(B

Martin D. wrote:

e$B$3$l$O30It$+$ie(B (e$B%U%!%$%k$+$ie(B) e$B$NFI$9~$$N$H$-$KBgJQBEEv$@$H;W$$$^$9$,!"e(B
e$BFbIt=hM}$N$?$a$N7h$^$j$G$O$J$/!"FbIt=hM}$K$O8~$$$F$J$$$H;W$$$^$9!#e(B

e$B$($’!"30It$H$NF~=PNO$G$NOC$G$9$M!#e(B

e$B@.@%$G$9!#e(B

U.Nakamura wrote:

In message “[ruby-dev:32966] Re: Shift_JIS variants and UTF-16 support”
on Jan.07,2008 21:00:47, [email protected] wrote:
| > = open(write)e$B$N%b!<%I$H$7$Fe(B
| > "utf-16"e$B$O<u$1IU$1$J$$e(B(InvalidArgument)
|
| BOM e$B$r$D$1$F=PNO$9$k<jCJ$,M_$7$$5$$,$7$^$9!#e(B

“w:utf-16be” e$B$de(B “w:utf-16le” e$B$OEvA3e(BBOMe$B$r%U%!%$%k@hF,$K=PNO$9e(B
e$B$k$H$$$&A[Dj$G$7$?!#e(B

RFC 2781 3.3 e$B$K!He(BSystems labelling UTF-16BE text MUST NOT prepend
a BOM
to the text.e$B!I$H$"$k$N$G!“e(B"w:utf-16be” e$B$de(B “w:utf-16le”
e$B$G$Oe(B BOM e$B$O=PNO$7e(B
e$B$J$$$3$H$K$J$k$+$H;W$$$^$9!#e(B

"r+"e$B$de(B"w+“e$B!“e(B"a"e$B$J$I$O!”$?$V$s4{$Ke(BBOMe$B$,$”$k$H4|BT$7$F2?$b$7$Je(B
e$B$$$N$G$7$g$&!#e(Bopene$B$7$F$+$ie(Bset_encodinge$B$9$k>l9g$bF1MM!#e(B
… e$B$H$+$$$&$3$H$r9M$(=P$9$H$H$?$s$KOC$,Fq$7$/$J$j$^$9$M!#e(B
e$B2L$?$7$F$3$l$G$$$$$N$+$J!#e(B

e$B$o$?$7$,5$$E$$$?$N$Oe(B truncate(0)
e$B$7$?8e$I$&$7$h$&!"$C$F$$$&!&!&!&!#e(B

| > = String#encodee$B$N0z?t$H$7$Fe(B
| > “UTF-16"e$B$H$$$&e(BEncodinge$B$O$J$$$N$GLdBj$J$$!#e(B
|
| e$B$3$l$@$1$@$HLdBj$J$$$h$&$J5$$,$9$k$N$G$9$,!”=PNO;~$Ke(B BOM e$B$,$D$1$i$l$k$3e(B
| e$B$H$r4|BT$9$kJ8;zNs$NB8:_$r9M$($k$HG:$^$7$$$H$3$m$G$9!#e(B

Stringe$B%%V%8%'%/%H$,!"L@<(E$K$=$&$$$&J8;zNsA`:n$r$d$i$J$$8Be(B
e$B$j$Oe(BBOMe$B$r5$$K$9$kI,MW$O$J$$e(B(e$B!V=PNO!W$r5$$K$9$k$N$Oe(BIOe$B$N;E;ve(B)e$B$He(B
e$B;W$&$N$G$9$,!"2?$+;d$,%]%$%s%H$r8+Mn$H$7$F$k$N$G$7$g$&$+!#e(B

e$B@h=R$NDL$j$J$N$G!“e(BUTF-16BE e$BJ8;zNs$re(B “w:utf-16be”
e$B$9$k$He(B BOM e$B$,$D$+$J$$$Ne(B
e$B$G%I%&%7%h%&!”$H;W$C$?$N$G$9$,!"$?$7$+$K$=$l$Oe(B IO
e$B$N;E;v$G$9$M!#e(BUTF-16BE
e$BJ8;zNs$re(B “w:utf-16” e$B$9$k$He(B BOM e$BIU$-e(B
UTF-16BEe$B!"e(BUTF-16LE e$BJ8;zNs$re(B
“w:utf-16” e$B$9$k$He(B BOM e$BIU$-e(B UTF-16LE
e$B$H$$$&46$8$G$9$+$M!#e(B

e$B@.@%$G$9!#e(B

U.Nakamura wrote:

| e$B@h=R$NDL$j$J$N$G!“e(BUTF-16BE e$BJ8;zNs$re(B “w:utf-16be” e$B$9$k$He(B BOM e$B$,$D$+$J$$$Ne(B
| e$B$G%I%&%7%h%&!”$H;W$C$?$N$G$9$,!"$?$7$+$K$=$l$Oe(B IO e$B$N;E;v$G$9$M!#e(BUTF-16BE
| e$BJ8;zNs$re(B “w:utf-16” e$B$9$k$He(B BOM e$BIU$-e(B UTF-16BEe$B!"e(BUTF-16LE e$BJ8;zNs$re(B
| “w:utf-16” e$B$9$k$He(B BOM e$BIU$-e(B UTF-16LE e$B$H$$$&46$8$G$9$+$M!#e(B

RFC 2781e$B$K=>$&$H$=$&$$$&$3$H$K$J$j$=$&$G$9!#e(B
e$B$G$b!"$3$&$J$k$He(B"UTF-16LE"e$B$JJ8;zNs$re(BUTF-16BEe$B$KJQ49$7$Fe(BBOMe$BIU$-e(B
e$B$G=PNO$7$?$$>l9g$C$F$I$&$9$l$P$$$$$s$G$7$g$&$M!#e(B
IOe$B$N<+F0JQ495!9=$rD|$a$FJLESJQ49$7$F$+$i=PNO$9$k$7$+$J$$$s$Ge(B
e$B$7$g$&$+!#e(B

“w:utf-16le-bom” e$B$de(B “w;utf-16be-bom” e$BEy$Ne(B IO
e$B@lMQ$N%(%s%3!<%G%#%s%0L>$re(B
e$BMQ0U$9$k$H$+$G$9$+$M$'!#e(Bnkf e$B$Ge(B BOM
e$BIU$-$G%(%s%G%#%“%s$rL@<($5$;$?$$$H$-e(B
e$B$O$3$NJ}K!$r;H$C$F$$$^$9!#e(BJava e$B$K$b$”$k$_$?$$$G$9$M!#e(B

e$B@h$N!“e(B"r+“e$B$de(B"w+“e$B$de(Btruncate(0)e$B$7$?>l9g$Ne(BBOMe$B$N07$$$NFq$7$5$r9Me(B
e$B$($k$H!”$d$C$Q$je(BBOMe$BIU$-=PNO$rAH$9~$$G;}$D$N$O$d$a$F!”%f!<%6e(B
e$B$K$Oe(B”:utf-16be"e$B$“$k$$$Oe(B”:utf-16le"e$B$G%U%!%$%k$r%!<%W%s$7$F$be(B
e$B$i$C$Fe(BBOMe$B$OL@<(E
$K%U%!%$%k@hF,$Ke(BZERO WIDTH NON-BREAKING SPACE
e$B$r=PNO$7$F$b$i$&J}$,$$$$$N$+$b$7$l$^$;$s!#e(B

e$B$H$j$"$($:L$<BAu$K$7$F$*$$$F!“EvJ,$O$=$NJ}K!$r<h$C$F$$$?$@$$$F!”$=$N$&$Ae(B
e$BB>$N<BAu$K9g$o$;$k$N$,$$$$$N$G$7$g$&$M$'!#e(B

e$B$3$s$K$A$O!"$J$+$`$ie(B(e$B$&e(B)e$B$G$9!#e(B

In message “[ruby-dev:32969] Re: Shift_JIS variants and UTF-16 support”
on Jan.07,2008 21:59:42, [email protected] wrote:
| > | > = open(write)e$B$N%b!<%I$H$7$Fe(B
| > | > “utf-16"e$B$O<u$1IU$1$J$$e(B(InvalidArgument)
| > |
| > | BOM e$B$r$D$1$F=PNO$9$k<jCJ$,M_$7$$5$$,$7$^$9!#e(B
| >
| > “w:utf-16be” e$B$de(B “w:utf-16le” e$B$OEvA3e(BBOMe$B$r%U%!%$%k@hF,$K=PNO$9e(B
| > e$B$k$H$$$&A[Dj$G$7$?!#e(B
|
| RFC 2781 3.3 e$B$K!He(BSystems labelling UTF-16BE text MUST NOT prepend a BOM
| to the text.e$B!I$H$”$k$N$G!“e(B"w:utf-16be” e$B$de(B “w:utf-16le” e$B$G$Oe(B BOM e$B$O=PNO$7e(B
| e$B$J$$$3$H$K$J$k$+$H;W$$$^$9!#e(B

e$B$&$&$!"$J$k$[$I!"e(BRFC 2781e$B$rFI$$H!";d$,!VEvA3!W$H;W$C$?$N$Oe(B
e$B<B$O$^$C$?$/5U$@$C$?$s$G$9$Me(B…

| > | > = String#encodee$B$N0z?t$H$7$Fe(B
| > | > “UTF-16"e$B$H$$$&e(BEncodinge$B$O$J$$$N$GLdBj$J$$!#e(B
| > |
| > | e$B$3$l$@$1$@$HLdBj$J$$$h$&$J5$$,$9$k$N$G$9$,!”=PNO;~$Ke(B BOM e$B$,$D$1$i$l$k$3e(B
| > | e$B$H$r4|BT$9$kJ8;zNs$NB8:_$r9M$($k$HG:$^$7$$$H$3$m$G$9!#e(B
| >
| > Stringe$B%%V%8%'%/%H$,!"L@<(E$K$=$&$$$&J8;zNsA`:n$r$d$i$J$$8Be(B
| > e$B$j$Oe(BBOMe$B$r5$$K$9$kI,MW$O$J$$e(B(e$B!V=PNO!W$r5$$K$9$k$N$Oe(BIOe$B$N;E;ve(B)e$B$He(B
| > e$B;W$&$N$G$9$,!“2?$+;d$,%]%$%s%H$r8+Mn$H$7$F$k$N$G$7$g$&$+!#e(B
|
| e$B@h=R$NDL$j$J$N$G!“e(BUTF-16BE e$BJ8;zNs$re(B “w:utf-16be” e$B$9$k$He(B BOM e$B$,$D$+$J$$$Ne(B
| e$B$G%I%&%7%h%&!”$H;W$C$?$N$G$9$,!”$?$7$+$K$=$l$Oe(B IO e$B$N;E;v$G$9$M!#e(BUTF-16BE
| e$BJ8;zNs$re(B “w:utf-16” e$B$9$k$He(B BOM e$BIU$-e(B UTF-16BEe$B!"e(BUTF-16LE e$BJ8;zNs$re(B
| “w:utf-16” e$B$9$k$He(B BOM e$BIU$-e(B UTF-16LE e$B$H$$$&46$8$G$9$+$M!#e(B

RFC 2781e$B$K=>$&$H$=$&$$$&$3$H$K$J$j$=$&$G$9!#e(B
e$B$G$b!"$3$&$J$k$He(B"UTF-16LE"e$B$JJ8;zNs$re(BUTF-16BEe$B$KJQ49$7$Fe(BBOMe$BIU$-e(B
e$B$G=PNO$7$?$$>l9g$C$F$I$&$9$l$P$$$$$s$G$7$g$&$M!#e(B
IOe$B$N<+F0JQ495!9=$rD|$a$FJLESJQ49$7$F$+$i=PNO$9$k$7$+$J$$$s$Ge(B
e$B$7$g$&$+!#e(B

e$B@h$N!“e(B"r+“e$B$de(B"w+“e$B$de(Btruncate(0)e$B$7$?>l9g$Ne(BBOMe$B$N07$$$NFq$7$5$r9Me(B
e$B$($k$H!”$d$C$Q$je(BBOMe$BIU$-=PNO$rAH$9~$$G;}$D$N$O$d$a$F!”%f!<%6e(B
e$B$K$Oe(B”:utf-16be"e$B$“$k$$$Oe(B”:utf-16le"e$B$G%U%!%$%k$r%!<%W%s$7$F$be(B
e$B$i$C$Fe(BBOMe$B$OL@<(E
$K%U%!%$%k@hF,$Ke(BZERO WIDTH NON-BREAKING
SPACE
e$B$r=PNO$7$F$b$i$&J}$,$$$$$N$+$b$7$l$^$;$s!#e(B

e$B$$$d$O$d!"Fq$7$$$G$9$M$3$NLdBj$O!#e(B

e$B$=$l$G$O!#e(B

e$B$^$D$b$He(B e$B$f$-$R$m$G$9e(B

In message “Re: [ruby-dev:32965] Re: Shift_JIS variants and UTF-16
support”
on Mon, 7 Jan 2008 20:53:34 +0900, “NARUSE, Yui”
[email protected] writes:

|e$B;W$&$K!“!Ve(BUTF-16e$B!W$N2ACM$OFI$9~$;~$Ke(B BOM e$B$r8+$F%P%$%H%*!<%@!<$r?dB,$7e(B
|e$B$F$/$l$kE@$@$H;W$&$N$G!”$=$l$r$7$J$$$J$i$P$$$C$=Hs%5%]!<%H$NJ}$,:.Mp$,>/e(B
|e$B$J$$MM$K46$8$^$9!#e(B

e$B$=$C$A$NJ}8~$G!#e(B

e$B$^$D$b$He(B e$B$f$-$R$m$G$9e(B

In message “Re: [ruby-dev:32971] Re: Shift_JIS variants and UTF-16
support”
on Mon, 7 Jan 2008 22:18:42 +0900, “U.Nakamura”
[email protected] writes:

|e$B@h$N!“e(B"r+“e$B$de(B"w+“e$B$de(Btruncate(0)e$B$7$?>l9g$Ne(BBOMe$B$N07$$$NFq$7$5$r9Me(B
|e$B$($k$H!”$d$C$Q$je(BBOMe$BIU$-=PNO$rAH$9~$$G;}$D$N$O$d$a$F!”%f!<%6e(B
|e$B$K$Oe(B”:utf-16be"e$B$“$k$$$Oe(B”:utf-16le"e$B$G%U%!%$%k$r%!<%W%s$7$F$be(B
|e$B$i$C$Fe(BBOMe$B$OL@<(E
$K%U%!%$%k@hF,$Ke(BZERO WIDTH NON-BREAKING SPACE
|e$B$r=PNO$7$F$b$i$&J}$,$$$$$N$+$b$7$l$^$;$s!#e(B
|
|e$B$$$d$O$d!"Fq$7$$$G$9$M$3$NLdBj$O!#e(B

e$B$3$NJU$O<+F0$G$d$k$H4r$7$$$3$H$h$j$b:$$k$3$H$NJ}$,B?$=$&$J$Ne(B
e$B$G!"EvLL$OL$BP1~$H$7$^$;$s$+!#A4A3;H$($J$$$o$1$8$c$J$$$s$@$7!#e(B

BOMe$B$H$+e(BUTF-16e$B$H$+!“8D?ME*$K<:GT$@$C$?$H;W$C$F$$$k5,3J$r1~1g$7e(B
e$B$?$j!”@Q6KE*$K;Y1g$7$?$j$9$k5$$K$J$l$^$;$s!#e(B

e$B$3$s$K$A$O!"$J$+$`$ie(B(e$B$&e(B)e$B$G$9!#e(B

In message “[ruby-dev:32977] Re: Shift_JIS variants and UTF-16 support”
on Jan.07,2008 23:15:39, [email protected] wrote:
| e$B$3$NJU$O<+F0$G$d$k$H4r$7$$$3$H$h$j$b:$$k$3$H$NJ}$,B?$=$&$J$Ne(B
| e$B$G!"EvLL$OL$BP1~$H$7$^$;$s$+!#A4A3;H$($J$$$o$1$8$c$J$$$s$@$7!#e(B
|
| BOMe$B$H$+e(BUTF-16e$B$H$+!“8D?ME*$K<:GT$@$C$?$H;W$C$F$$$k5,3J$r1~1g$7e(B
| e$B$?$j!”@Q6KE*$K;Y1g$7$?$j$9$k5$$K$J$l$^$;$s!#e(B

e$B$(!<$H!"e(B[ruby-dev:32978]e$B$H9M$(9g$o$;$k$H!"e(B

= Encoding
"UTF-16BE"e$B!“e(B"UTF-16LE"e$B$rMQ0U$7!”$I$C$A$@$+ITL@NF$Je(B"UTF-16"e$B$Oe(B
e$BMQ0U$7$J$$!#e(B

= open(reade$B;~e(B)
“:utf-16be"e$B!“e(B”:utf-16le"e$B$r%5%]!<%H$9$k!#e(B
BOMe$B$O0U<1$7$J$$e(B(ZERO WIDTH NON-BREAKING
SPACEe$B$H$7$FFI$_9~$^$le(B
e$B$ke(B)e$B!#e(B
e$B;XDj$H0[$J$k%(%s%G%#%”%s$Ne(BBOMe$B$,$"$C$?$iC1$K$=$l$OIT@5$J%7!<%1e(B
e$B%s%9$NJ8;z$H07$o$l$ke(B(e$BNc30$,=P$k$s$G$7$?$C$1e(B?)

= open(writee$B;~e(B)
":utf-16be"e$B!“e(B”:utf-16le"e$B$r%5%]!<%H$9$k!#e(B
BOMe$B$O=q$+$J$$!#$I$&$7$F$b=q$-$?$$?M$OL@<(E*$K%U%!%$%k$N@hF,$Ke(B
ZERO WIDTH NON-BREAKING SPACEe$B$r=PNO$9$k$3$H!#e(B

= String#encodee$B$N0z?t$H$7$Fe(B
e$BEvA3$J$,$ie(B"UTF-16BE"e$B!"e(B"UTF-16LE"e$B$N$_$r<u$1IU$1!"e(B"UTF-16"e$B$O<u$1e(B
e$BIU$1$J$$!#e(B

… e$B$H$$$&$3$H$K$J$k$G$7$g$&$+!#e(B

e$B0J>e$rF’$^$($l$P!“8=>u$Ne(BRubye$B$Ne(BIOe$B$$h$Se(BStringe$B<BAu$KFCJL$JDI2Ce(B
e$B$NBP1~$OITMW$G!"C1=c$Ke(Benc/utf16.ce$B$rMQ0U$7$Fe(B"UTF-16BE"e$B$
$h$Se(B
“UTF-16LE"e$B%(%s%3!<%G%#%s%0$rDI2C$9$k$@$1$G$h$$$H$$$&$3$H$K$Je(B
e$B$j$=$&$G$9!#e(B
e$B$3$l0J>e$N%5%]!<%H$O!”:#8e$J$s$+>u67$KJQ2=$,$”$C$?$i9M$($k!"e(B
e$B$H!#e(B

e$B$=$l$G$O!#e(B

e$B$^$D$b$He(B e$B$f$-$R$m$G$9e(B

In message “Re: [ruby-dev:32981] Re: Shift_JIS variants and UTF-16
support”
on Mon, 7 Jan 2008 23:36:35 +0900, “U.Nakamura”
[email protected] writes:

|e$B$(!<$H!"e(B[ruby-dev:32978]e$B$H9M$(9g$o$;$k$H!"e(B

(e$BN,e(B)

|e$B0J>e$rF’$^$($l$P!"8=>u$Ne(BRubye$B$Ne(BIOe$B$$h$Se(BStringe$B<BAu$KFCJL$JDI2Ce(B
|e$B$NBP1~$OITMW$G!"C1=c$Ke(Benc/utf16.ce$B$rMQ0U$7$Fe(B"UTF-16BE"e$B$
$h$Se(B
|"UTF-16LE"e$B%(%s%3!<%G%#%s%0$rDI2C$9$k$@$1$G$h$$$H$$$&$3$H$K$Je(B
|e$B$j$=$&$G$9!#e(B

e$B$"$H!"e(Benc/transe$B$K$be(BUTF-16e$BBP1~$,I,MW$G$7$g$&$1$I!#e(B

e$B$$$:$l$K$7$F$b!“$^$H$a$F$$$?$@$$$F$”$j$,$H$&$4$6$$$^$9!#$3$le(B
e$B$,8=;~E@$G9M$($k$K0lHVL5Fq$JBP1~$@$H;W$$$^$9e(B

|e$B$3$l0J>e$N%5%]!<%H$O!“:#8e$J$s$+>u67$KJQ2=$,$”$C$?$i9M$($k!"e(B
|e$B$H!#e(B

e$B$=$&$$$&$3$H$G!#e(B