String::gsub$B$K$*$1$k(Binvalid byte sequence$B$N8!=P$K$D$$$F(B

竹川と申します。

ruby 1.9で、あるUTF-8のStringに対してgsubを呼び出したところ、以下のエ
ラーになります。

$ cat sample/testdata.txt | ruby sample/gsub.rb
sample/gsub.rb:5:in gsub': invalid byte sequence in UTF-8 (ArgumentError) from sample/gsub.rb:5:in
$ ruby --version
ruby 1.9.1p243 (2009-07-16 revision 24175) [x86_64-linux]

テストデータの最後の一文字は ‘a’ なのですが、この最後の文字を削るだけで
エラーにならなくなります。スクリプトならびにテストデータを添付いたします。

不正な文字を見つけることができず、また、
ちらっと ruby ã®ã‚½ãƒ¼ã‚¹ã‚³ãƒ¼ãƒ‰ã‚’è¦‹ãŸã®ã§ã™ãŒã€åŽŸå› ãŒã‚ã‹ã‚‰ãšã€
å ±å‘Šã•ã›ã¦ã„ãŸã ãã¾ã—ãŸã€‚

なにか回避方法があれば、ご教示いただけないでしょうか ?

よろしくお願いいたします。

[email protected]>;3OB9-$G$9!#e(B

At Fri, 20 Nov 2009 20:44:01 +0900,
TAKEGAWA Hiroshi wrote:

e$B$J$K$+2sHrJ}K!$,$"$l$P!"$465<([email protected]$1$J$$$G$7$g$&$+e(B ?

e$B2?$,0c$&$N$+$h$/$o$+$j$^$;$s$,!"%Q%$%[email protected]$H:F8=$7$F%j%@%$%l%/%[email protected]$He(B
e$B:F8=$7$J$$$h$&$G$9!#e(B

% ruby-1.9.1 -v sample/gsub.rb < sample/testdata.txt
ruby 1.9.1p339 (2009-11-17 revision 25816) [x86_64-linux]
% cat sample/testdata.txt | ruby-1.9.1 -v sample/gsub.rb
ruby 1.9.1p339 (2009-11-17 revision 25816) [x86_64-linux]
sample/gsub.rb:5:in gsub': invalid byte sequence in UTF-8 (ArgumentError) from sample/gsub.rb:5:in
%

[email protected]@%$G$9!#e(B

Kazuhiro NISHIYAMA wrote:

At Fri, 20 Nov 2009 20:44:01 +0900,
TAKEGAWA Hiroshi wrote:

e$B$J$K$+2sHrJ}K!$,$"$l$P!"$465<([email protected]$1$J$$$G$7$g$&$+e(B ?

e$B2?$,0c$&$N$+$h$/$o$+$j$^$;$s$,!"%Q%$%[email protected]$H:F8=$7$F%j%@%$%l%/%[email protected]$He(B
e$B:F8=$7$J$$$h$&$G$9!#e(B

testdata.txt e$B$NFbMF$,e(B “a”*1023+"e$B$“e(B”+"a"1022
e$B$G$b:F8=$7$^$7$?!#e(B
e$B?dB,$9$k$K!"%P%C%U%!%5%$%:e(B 1024 e$B%P%$%H$J$H$3$m!"e(B
e$B:G=i$N%P%C%U%!$NKvHx$KB?%P%$%HJ8;[email protected],$,Mh$?>l9g$K$
$+$7$/$J$k$N$G$7$g$&!#e(B

[email protected]@%$G$9!#e(B

TAKEGAWA Hiroshi wrote:

e$B%(%i!<$K$J$i$J$/$J$j$^$9!#%9%/%j%W%H$J$i$S$K%F%9%H%G!<%?$rE:IU$$$?$7$^$9!#e(B

[email protected]$JJ8;z$r8+$D$1$k$3$H$,$G$-$:!"$^$?!"e(B
e$B$A$i$C$He(B ruby e$B$N%=!<%9%3!<%I$r8+$?$N$G$9$,!"860x$,$o$+$i$:!"e(B
e$BJs9p$5$;[email protected]$-$^$7$?!#e(B

e$B$J$K$+2sHrJ}K!$,$"$l$P!"$465<([email protected]$1$J$$$G$7$g$&$+e(B ?

e$B$^$:!“[email protected]$+$i?=$7$^$9$H!”$3$l$Oe(B Ruby
e$BB&$N%P%0$J$N$G!"[email protected]$7$^$9!#e(B

e$B<!$KEvLL$N2sHr:v$G$9$,!"0J2<$N$h$&$K6uJ8;z$rDI2C$9$k%3!<%I$r64$a$P$h$$$G$9!#e(B
(String#concat e$B$de(B String#<< e$B$G$O2sHr$G$-$J$$e(B)
e$B$4ITJX$*$+$1$7$^$9$,!"EvLL$O$3$l$G$7$N$$$G$/[email protected]$5$$!#e(B

#! /usr/local/bin/ruby

-- coding: utf-8 --

$stdin.set_encoding(‘UTF-8’)
str = $stdin.read
p str.valid_encoding? #=> false
str += “”
p str.valid_encoding? #=> true

[email protected]>;3MM!"@[email protected]%MMe(B

TAKEGAWA Hiroshi e$B$5$s$O=q$-$^$7$?e(B:

e$BC]@n$H?=$7$^$9!#e(B

ruby 1.9e$B$G!"$"$ke(BUTF-8e$B$Ne(BStringe$B$KBP$7$Fe(Bgsube$B$r8F$S=P$7$?$H$3$m!"0J2<$N%(e(B
e$B%i!<$K$J$j$^$9!#e(B

e$BK\7o!“D4::$J$i$S$K2sHrJ}K!$r$465<([email protected]$-$I$&$b$”$j$,$H$&$4$6$$$^$7$?!#e(B
e$B:#2s!"?WB.$K$*JV;[email protected]$/$3$H$,$G$-!":n6H$,$[$;_$^$i$:BgJQ=u$+$j$^$7$?!#e(B

e$B$^$?$J$K$+$"$j$^$7$?$iJs9p$5$;[email protected]$-$?$$$H;W$$$^$9!#e(B
e$B$J$*!"K\7o!"K\Mh$Oe(B ruby-list
[email protected]$C$?$3$H$K8e$G5$$,IU$-e(B
e$B$^$7$?!#0J8e5$$r$D$1$?$$$H;W$$$^$9!#e(B

e$B0J>e$G$9!#e(B

[email protected]@%MMe(B

[email protected]@[email protected]$-$"$j$,$H$&$4$6$$$^$7$?!#e(B

[email protected]!"$J$+$J$+e(B Ruby e$B$N%P%0$G$"$k$H3N?.$r;}$F$J$$$b$N$Ge(B …
e$B:#2s$bC1$K<+J,$N$D$^$i$J$$8+Mn$H$7$G$J$$$+$H$$$&IT0B$rJz$($D$D!"Ej9F$5$;e(B
[email protected]$-$^$7$?!#e(B

e$B0J>e$G$9!#e(B

NARUSE, Yui e$B$5$s$O=q$-$^$7$?e(B:

[email protected]@%$G$9!#e(B

TAKEGAWA Hiroshi wrote:

TAKEGAWA Hiroshi e$B$5$s$O=q$-$^$7$?e(B:

ruby 1.9e$B$G!"$"$ke(BUTF-8e$B$Ne(BStringe$B$KBP$7$Fe(Bgsube$B$r8F$S=P$7$?$H$3$m!"0J2<$N%(e(B
e$B%i!<$K$J$j$^$9!#e(B

e$BK\7o!“D4::$J$i$S$K2sHrJ}K!$r$465<([email protected]$-$I$&$b$”$j$,$H$&$4$6$$$^$7$?!#e(B
e$B:#2s!"?WB.$K$*JV;[email protected]$/$3$H$,$G$-!":n6H$,$[$;_$^$i$:BgJQ=u$+$j$^$7$?!#e(B

e$B$=$N8e!"e(Br25880 [email protected]$7$^$7$?!#e(B
http://redmine.ruby-lang.org/repositories/revision/ruby-19?rev=25880

e$B$^$?$J$K$+$"$j$^$7$?$iJs9p$5$;[email protected]$-$?$$$H;W$$$^$9!#e(B
e$B$J$*!"K\7o!"K\Mh$Oe(B ruby-list [email protected]$C$?$3$H$K8e$G5$$,IU$-e(B
e$B$^$7$?!#0J8e5$$r$D$1$?$$$H;W$$$^$9!#e(B

[email protected]$+$i?=$7$^$9$H!"$3$l$Oe(B ruby-dev e$B$G$h$+$C$?$H;W$$$^$9!#e(B
e$B4JC1$K;H$$J,$1$r=R$Y$k$H!"e(B

  • ruby-list e$B$Oe(B Ruby e$B$r!V;H$&!W?Me(B
  • ruby-dev e$B$Oe(B Ruby e$B$r!V:n$k!W?Me(B
    e$B$N$?$a$N%a!<%j%s%0%j%9%H$H$$$&;v$K$J$j$^$9!#e(B

e$B$D$^$j!"K\7o$N$h$&$Je(B Ruby e$B<+BN$N%P%0$rJs9p$9$k:]$O!"e(B
e$B!Ve(BRuby e$B$r!X:n$k!Y?M!W$,8+$k$Y$-$b$N$J$N$Ge(B ruby-dev
e$B$,E,@Z$G$9!#e(B