Nonlinear performance of each_char

String#each_char e$B$N=hM};~4V$,J8;zNsD9$KHfNc$7$J$$46$8$G$9!#e(B

% ./ruby -v -e ’
c = “\xa1\xa1”.force_encoding(“euc-jp”);
1000.step(20000, 1000) {|n|
s = c * n
t1 = Process.times.utime
s.each_char {|x| }
t2 = Process.times.utime
p t2 - t1
}

ruby 1.9.0 (2008-01-19 revision 0) [i686-linux]
0.09
0.35
0.8
1.41
2.22
3.21
4.34
5.68
7.14
8.85
10.62
12.63
14.8
17.15
19.7
22.38
25.25
28.25
31.47
34.9

e$B$J$+$@$G$9!#e(B

At Sun, 20 Jan 2008 01:07:05 +0900,
Tanaka A. wrote in [ruby-dev:33189]:

String#each_char e$B$N=hM};~4V$,J8;zNsD9$KHfNc$7$J$$46$8$G$9!#e(B

e$BKh2se(Brb_str_substr()e$B$NCf$G@hF,$+$iC5$7$F$$$k$+$i$G$9$M!#e(B

e$B%V%m%C%/$NCf$GJQ99$5$l$k$H$A$g$C$H$*$+$7$J$3$H$K$J$k$+$b$7$l$^$;e(B
e$B$s$,!"$=$s$J$K<u$1F~$l$,$?$$@)8B$G$O$J$$$H;W$$$^$9!#e(B

Index: string.c

— string.c (revision 15137)
+++ string.c (working copy)
@@ -4591,9 +4591,20 @@ static VALUE
rb_str_each_char(VALUE str)
{

  • int i, len = str_strlen(str, 0);
  • int i, len, n;

  • const char *ptr, *s;

  • rb_encoding *enc;

    RETURN_ENUMERATOR(str, 0, 0);

  • for (i=0; i<len; i++) {
  • rb_yield(rb_str_substr(str, i, 1));
  • ptr = RSTRING_PTR(str);
  • len = RSTRING_LEN(str);
  • enc = rb_enc_get(str);
  • n = rb_enc_precise_mbclen(ptr, ptr + len, enc);
  • for (i = 0; i < len; i += n) {
  • rb_yield(rb_str_subseq(str, i, n));
  • ptr = RSTRING_PTR(str);
  • len = RSTRING_LEN(str);
  • enc = rb_enc_get(str);
  • s = rb_enc_left_char_head(ptr, ptr + i, enc);
  • n = rb_enc_precise_mbclen(s, ptr + len, enc);
    }
    return str;

e$B;n$7$F$^$;$s$,!"e(B

In article [email protected],
Nobuyoshi N. [email protected] writes:

  • n = rb_enc_precise_mbclen(ptr, ptr + len, enc);
  • for (i = 0; i < len; i += n) {
  • rb_yield(rb_str_subseq(str, i, n));

rb_enc_precise_mbclen e$B$OIi$NCM$rJV$9$3$H$,$“$k$N$G!”$h$m$7$/e(B
e$B$J$$$h$&$K;W$$$^$9!#e(B

e$BJV$jCM$r8!::$7$J$J$ie(B rb_enc_mbclen e$B$G$$$$$N$G$O$J$$$G$7$g$&e(B
e$B$+!#e(B