[patch] performance improvement patch for benchmark/bm_so_count_words.rb

高橋征義です。

1.9でbenchmarkを走らせたところ、bm_so_count_wordsの結果がひどかった
(1.8のときより30倍くらい遅くなる)ので、パッチを書いてみました。
基本的にはsinglebyteの時にエンコーディングを気にせず、1.8の時の
コードのように処理させるものです。これでも1.8の2倍程度の時間は
かかってしまうようですが(Mac OS X 10.5.4)。

あと、single_byte_optimizable(str) に、rb_enc_str_asciionly_p(str) を
åŠ ãˆã¦ã—ã¾ã£ã¦ã‚‚ã„ã„ã®ã§ã—ã‚‡ã†ã‹ï¼Ÿã€€ä¸€èˆ¬ã«åŠ ãˆã¦ã—ã¾ã£ã¦ã‚ˆã„ã®ã‹
よく分からなかったので、このpatchでは、
if (single_byte_optimizable(str) || rb_enc_str_asciionly_p(str)) {
などという形で判別するようにしています。

singlebyteの時だけ早くなってどーする、と意見もあるかと
思いますが、ないよりはマシかと思うので、よろしくお願いします。

高橋征義 ([email protected])

e$B$^$D$b$He(B e$B$f$-$R$m$G$9e(B

In message “Re: [ruby-dev:36300] [patch] performance improvement patch
for benchmark/bm_so_count_words.rb”
on Mon, 15 Sep 2008 12:13:41 +0900, “masayoshi takahashi”
[email protected] writes:

|1.9e$B$Ge(Bbenchmarke$B$rAv$i$;$?$H$3$m!“e(Bbm_so_count_wordse$B$N7k2L$,$R$I$+$C$?e(B
|e$B!Je(B1.8e$B$N$H$-$h$je(B30e$BG$/$i$$CY$/$J$k!K$N$G!”%Q%C%A$r=q$$$F$_$^$7$?!#e(B
|e$B4pK\E*$K$Oe(Bsinglebytee$B$N;~$K%(%s%3!<%G%#%s%0$r5$$K$;$:!"e(B1.8e$B$N;~$Ne(B
|e$B%3!<%I$N$h$&$K=hM}$5$;$k$b$N$G$9!#$3$l$G$be(B1.8e$B$Ne(B2e$BG\DxEY$N;~4V$Oe(B
|e$B$+$+$C$F$7$^$&$h$&$G$9$,e(B(Mac OS X 10.5.4)e$B!#e(B

e$B$"$"!":#F|!“869F$r$7$”$2$?$i<h$j3]$+$m$&$H;W$C$F$$$?$N$K!#e(B

|e$B$"$H!"e(Bsingle_byte_optimizable(str) e$B$K!“e(Brb_enc_str_asciionly_p(str) e$B$re(B
|e$B2C$($F$7$^$C$F$b$$$$$N$G$7$g$&$+!)!!0lHL$K2C$($F$7$^$C$F$h$$$N$+e(B
|e$B$h$/J,$+$i$J$+$C$?$N$G!”$3$Ne(Bpatche$B$G$O!"e(B
| if (single_byte_optimizable(str) || rb_enc_str_asciionly_p(str)) {
|e$B$J$I$H$$$&7A$GH=JL$9$k$h$&$K$7$F$$$^$9!#e(B

single_byte_optimizable()e$B$Oe(B

mbmaxlene$B$,e(B1
e$B$^$?$Oe(BENC_CODERANGE_7BIT

e$B$N;~??$K$J$j!"e(B

rb_enc_str_asciionly_p()e$B$Oe(B

asciicompatiblee$B$G$"$je(B(mbminlene$B$,e(B1e$B$Ge(Bdummye$B$G$O$J$$e(B)e$B!"e(B
e$B$+$De(BENC_CODERANGE_7BIT

e$B$J$N$G!"$J$s$+;w$?$h$&$J>r7o$K$J$C$F$^$9!#HyL/$K0c$&$N$,$J$<e(B
[email protected]$+$h$/$o$+$i$J$$$N$G$9$,!"$I$C$A$+$G==J,$G$O$J$$$+$H;W$$$^e(B
e$B$9!#e(B

|singlebytee$B$N;[email protected]$1Aa$/$J$C$F$I!<$9$k!"$H0U8+$b$"$k$+$He(B
|e$B;W$$$^$9$,!"$J$$$h$j$O%^%7$+$H;W$&$N$G!"$h$m$7$/$*4j$$$7$^$9!#e(B

e$B$$$d!"e(Bsinglebytee$B$N;[email protected]$H;W$$$^$9!#$A$g$C$H%A%’%Ce(B
e$B%/$7$F$+$i%3%_%C%H$7$^$7$g$&!#e(B

                            e$B$^$D$b$He(B e$B$f$-$R$me(B /:|)

[email protected],5A$G$9!#e(B

2008/09/15 12:35 Yukihiro M. [email protected]:

|1.9e$B$Ge(Bbenchmarke$B$rAv$i$;$?$H$3$m!“e(Bbm_so_count_wordse$B$N7k2L$,$R$I$+$C$?e(B
|e$B!Je(B1.8e$B$N$H$-$h$je(B30e$BG$/$i$$CY$/$J$k!K$N$G!”%Q%C%A$r=q$$$F$_$^$7$?!#e(B
|e$B4pK\E*$K$Oe(Bsinglebytee$B$N;~$K%(%s%3!<%G%#%s%0$r5$$K$;$:!"e(B1.8e$B$N;~$Ne(B
|e$B%3!<%I$N$h$&$K=hM}$5$;$k$b$N$G$9!#$3$l$G$be(B1.8e$B$Ne(B2e$BG\DxEY$N;~4V$Oe(B
|e$B$+$+$C$F$7$^$&$h$&$G$9$,e(B(Mac OS X 10.5.4)e$B!#e(B

e$B$"$"!":#F|!“869F$r$7$”$2$?$i<h$j3]$+$m$&$H;W$C$F$$$?$N$K!#e(B

e$B$9$_$^$;$se(B:)
e$B$"$H!":rF|OC$7$F$$$?e(Bdowncasee$B$,%%H%k%M%C%/$K$J$C$F$$$?7o$O!"e(B
e$BJL$Ne(Bshootoute$B$N<BAu$G$N%5%s%W%k$G$7$?!#e(B

e$B$N;~??$K$J$j!"e(B

rb_enc_str_asciionly_p()e$B$Oe(B

asciicompatiblee$B$G$"$je(B(mbminlene$B$,e(B1e$B$Ge(Bdummye$B$G$O$J$$e(B)e$B!"e(B
e$B$+$De(BENC_CODERANGE_7BIT

e$B$J$N$G!"$J$s$+;w$?$h$&$J>r7o$K$J$C$F$^$9!#HyL/$K0c$&$N$,$J$<e(B
[email protected]$+$h$/$o$+$i$J$$$N$G$9$,!"$I$C$A$+$G==J,$G$O$J$$$+$H;W$$$^e(B
e$B$9!#e(B

1.9e$B$G$O!"e([email protected]$1$N%U%!%$%k$rFI$_9~$`>l9g$G$b!"FI$[email protected](B
e$BJ8;zNs$Ne(Bencodinge$B$O%G%U%)%k%H$Ge(BUTF-8e$B$K$J$k$h$&$J$N$G!"8=>u$Ne(B
single_byte_optimizable()
e$B$G$O??$K$J$j$^$;$se(B(e$B$N$G:$$C$?e(B)e$B!#$^$?!"e(B
rb_enc_str_asciionly_p()
e$B$N$
$G$O!"e(BASCII-8BITe$B$G$O??$K$J$i$J$$e(B
e$B$h$&$J5$$,$9$k$N$G!"N>J}$N>r7o$r:.$<$?$h$&$J$b$N$,e(B
e$BI,MW$K$J$k$N$G$O$J$$$+$H!#e(B

e$B$A$J$_$K$d$j$?$$$N$O!"J8;zNs$rAv::$9$k:]$K!"e(Bsinglebytee$B$:$De(B
e$BAv::$9$l$P$h$$$+$I$&$+$N$?$a$NH=JL>r7o$N$D$b$j$G$9!#e(B

|singlebytee$B$N;[email protected]$1Aa$/$J$C$F$I!<$9$k!"$H0U8+$b$"$k$+$He(B
|e$B;W$$$^$9$,!"$J$$$h$j$O%^%7$+$H;W$&$N$G!"$h$m$7$/$*4j$$$7$^$9!#e(B

e$B$$$d!"e(Bsinglebytee$B$N;[email protected]$H;W$$$^$9!#$A$g$C$H%A%’%Ce(B
e$B%/$7$F$+$i%3%_%C%H$7$^$7$g$&!#e(B

e$B$$$^$$$A9g$C$F$k$+<+?.$J$$$N$G!"$h$m$7$/3NG’[email protected]$1$l$P!#e(B

[email protected],5Ae(B ([email protected])

In article
[email protected],
“masayoshi takahashi” [email protected] writes:

1.9e$B$G$O!"e([email protected]$1$N%U%!%$%k$rFI$_9~$`>l9g$G$b!"FI$[email protected](B
e$BJ8;zNs$Ne(Bencodinge$B$O%G%U%)%k%H$Ge(BUTF-8e$B$K$J$k$h$&$J$N$G!"8=>u$Ne(B
single_byte_optimizable() e$B$G$O??$K$J$j$^$;$se(B(e$B$N$G:$$C$?e(B)e$B!#$^$?!"e(B
rb_enc_str_asciionly_p() e$B$N$
$G$O!"e(BASCII-8BITe$B$G$O??$K$J$i$J$$e(B
e$B$h$&$J5$$,$9$k$N$G!"N>J}$N>r7o$r:.$<$?$h$&$J$b$N$,e(B
e$BI,MW$K$J$k$N$G$O$J$$$+$H!#e(B

rb_enc_str_asciionly_p e$B$OJ8;zNs$r%9%-%c%[email protected]$,$"$j$^e(B
e$B$9!#e(B

e$B$b$7!“L\E*$N=hM}<+BN$h$j$bH=JL$K;~4V$,$+$+$k$N$J$i!”$=$l$OE,e(B
[email protected]$G$O$J$$$G$7$g$&!#e(B

[email protected]@%$G$9!#e(B

masayoshi takahashi wrote:

e$B$h$/J,$+$i$J$+$C$?$N$G!"$3$Ne(Bpatche$B$G$O!"e(B
if (single_byte_optimizable(str) || rb_enc_str_asciionly_p(str)) {
e$B$J$I$H$$$&7A$GH=JL$9$k$h$&$K$7$F$$$^$9!#e(B

singlebytee$B$N;[email protected]$1Aa$/$J$C$F$I!<$9$k!"$H0U8+$b$"$k$+$He(B
e$B;W$$$^$9$,!"$J$$$h$j$O%^%7$+$H;W$&$N$G!"$h$m$7$/$*4j$$$7$^$9!#e(B

[ruby-core:18532] e$B$KF1<q;]$N$b$N$r4^$`%Q%C%A$,$"$j$^$9$M!#e(B
e$B$I$NDxEY$+$V$C$F$k$+$^$G$O$^[email protected]+$F$$$^$;$s$,$H$j$$$=$.!#e(B

e$B$^$D$b$He(B e$B$f$-$R$m$G$9e(B

In message “Re: [ruby-dev:36337] Re: [patch] performance improvement
patch for benchmark/bm_so_count_words.rb”
on Wed, 17 Sep 2008 03:12:49 +0900, “masayoshi takahashi”
[email protected] writes:

|e$B:G6a$N%a!<%ke(B[ruby-core:18616]e$B$N!"e(Bcase.pate$B$He(Bcasecmp.pate$B$J$s$+$Oe(B
|e$BF1<q;]$N$b$N$H;W$o$l$^$9!#e(B

e$B:G?7$Ne(Btrunke$B$G$O$$$`$M<h$j9~$s$G$^$9!#e(B

|e$B$G!"@[email protected]%$5$s$4;XE&$Ne(B[ruby-core:18532]e$B$K$h$l$P!“e(Brb_str_modify()e$B$Ge(B
|CODERANGEe$B$r%/%j%”$7$F$$$k$N$,LdBj$G!"$=$l$,$J$1$l$P:FEYe(B
|e$BH=JL$9$kI,MW$,$J$$$N$G$O!"$H$$$&$3$H$G$7$?!#3N$+$K7W$C$F$_$k$He(B
|e$B$=$l$C$]$$$G$9!#e(B
|
| * e$B$3$NG’<1$O$"$C$F$^$9$+!)e(B

e$B$"$C$F$^$9!#e(B

| * rb_str_modify()e$B$N!“e(B ENC_CODERANGE_BROKENe$B$8$c$J$1$l$Pe(B
| e$B%/%j%”$7$J$$HG$re(Bstr_modify()e$B$H$$$&L>A0$G:n$k!"$,$$$$$G$9$+!)e(B

str_modify_keep_cr()e$B$H$$$&L>A0$GF3F~$7$^$7$?!#e(B

|e$B!J8D?ME*$K$Oe(Brb_str_modify()e$B$H$N0c$$$,J,$+$j$K$/$$$H$$$&$+!"$=$b$=$be(B
|e$B!!e(Brb_str_modify()e$B$rJQ$($F$7$^$C$F$$$$$N$G$O$J$$$+$H$$$&5$$be(B
|e$B!!$9$k$s$G$9$,!"$d$C$Q$j%/%j%"$9$kHG$bI,MW$J$s$G$7$g$&$+!)!Ke(B

rb_str_modify()e$B$r8F$V$H$$$&$3$H$OJ8;zNs$rJQ99$9$kM=Dj$,$"$k$He(B
e$B$$$&$3$H$G!"$=$l$O$D$^[email protected]$H$7$F$Oe(BCODERANGEe$B>pJs$,0];}$5$le(B
e$B$k$3$H$b4|BT$G$-$J$$$H$$$&$3$H$G$O$J$$$+$H;W$$$^$9!#e(B

[email protected],5A$G$9!#e(B

2008/09/15 13:51 NARUSE, Yui [email protected]:

[ruby-core:18532] e$B$KF1<q;]$N$b$N$r4^$`%Q%C%A$,$"$j$^$9$M!#e(B
e$B$I$NDxEY$+$V$C$F$k$+$^$G$O$^[email protected]+$F$$$^$;$s$,$H$j$$$=$.!#e(B

e$B$$!"$J$s$+$+$V$C$F$^$9$M!#e(B
e$B:G6a$N%a!<%ke(B[ruby-core:18616]e$B$N!"e(Bcase.pate$B$He(Bcasecmp.pate$B$J$s$+$Oe(B
e$BF1<q;]$N$b$N$H;W$o$l$^$9!#e(B

2008/09/15 16:06 Tanaka A. [email protected]:

e$B$b$7!“L\E*$N=hM}<+BN$h$j$bH=JL$K;~4V$,$+$+$k$N$J$i!”$=$l$OE,e(B
[email protected]$G$O$J$$$G$7$g$&!#e(B

e$B$d$C$Q$j$=$&$G$9$h$M$(!#H=JL$K;~4V$r$+$1$F$G$b=hM}A4BN$,Aa$$e(B
e$B>l9g$b$"$k$N$G$9$,!";~4V$r$+$1$:$K$9$a$P$=$l$K1[$7$?$3$H$Oe(B
e$B$J$$$o$1$G!#e(B

e$B$G!"@[email protected]%$5$s$4;XE&$Ne(B[ruby-core:18532]e$B$K$h$l$P!“e(Brb_str_modify()e$B$Ge(B
CODERANGEe$B$r%/%j%”$7$F$$$k$N$,LdBj$G!"$=$l$,$J$1$l$P:FEYe(B
e$BH=JL$9$kI,MW$,$J$$$N$G$O!"$H$$$&$3$H$G$7$?!#3N$+$K7W$C$F$_$k$He(B
e$B$=$l$C$]$$$G$9!#e(B

  • e$B$3$NG’<1$O$"$C$F$^$9$+!)e(B
  • rb_str_modify()e$B$N!“e(B ENC_CODERANGE_BROKENe$B$8$c$J$1$l$Pe(B
    e$B%/%j%”$7$J$$HG$re(Bstr_modify()e$B$H$$$&L>A0$G:n$k!"$,$$$$$G$9$+!)e(B
    e$B!J8D?ME*$K$Oe(Brb_str_modify()e$B$H$N0c$$$,J,$+$j$K$/$$$H$$$&$+!"$=$b$=$be(B
    e$B!!e(Brb_str_modify()e$B$rJQ$($F$7$^$C$F$$$$$N$G$O$J$$$+$H$$$&5$$be(B
    e$B!!$9$k$s$G$9$,!"$d$C$Q$j%/%j%"$9$kHG$bI,MW$J$s$G$7$g$&$+!)!Ke(B

[email protected],5Ae(B ([email protected])