[Feature #2372] read_all() with buffering

Feature #2372: read_all() with buffering
http://redmine.ruby-lang.org/issues/show/2372

e$B5/I<<Te(B: _ wanabe
e$B%9%F!<%?%9e(B: Open, e$BM%@hEYe(B: Low
e$B%+%F%4%je(B: core, Target version: 1.9.x

io.c e$B$Ne(B read_all()
e$B$G!“JQ49$,I,MW$J>l9g$N=hM}$,>/$7=E$$$h$&$@$C$?$N$Ge(B
e$B%P%C%U%!$K>/$7N/$a$F$+$ie(B io_shift_cbuf()
e$B$r8F$S=P$9$h$&$K$9$k%Q%C%A$r=q$-$^$7$?!#e(B
e$BN/$a$k%P%$%H?t$Oe(B io_shift_cbuf()
e$BCf$K$”$C$?<0$r;29M$K$7$F$$$^$9!#e(B

e$B%Y%s%A%^!<%/%9%/%j%W%H$H$=$N7k2L$rJ;$;$FE:IU$7$^$9!#e(B
bm_io_file_read.rb
e$B$r;29M$K!"%(%s%3!<%G%#%s%0$d%b!<%I$rJ#?t;n$9$h$&$K$7$F$$$^$9!#e(B
Windwos e$B0J30$G$Oe(B ‘r’ e$B%b!<%I$Oe(B ‘rb’
e$B%b!<%I$HF1$87k2L$K$J$k$H;W$$$^$9!#e(B

e$B:GBg$GLse(B 2.8
e$BG\B.$r7WB,$7$^$7$?$,!"JQ49$,J#;($J>l9g$K$O:9$O=L$^$j!"e(B
e$B$^$?EvA3$J$,$iC;$$%U%!%$%k$G$O$[$H$s$I:9$O=P$^$;$s$G$7$?!#e(B

e$B$I$J$?$+$b$76=L#$r;}$C$F$$$?$@$1$?$i!“e(BWindows
e$B$G$J$$!”$^$?$O==J,B.$$4D6-$Ge(B
e$BF1MM$KB.EY$,2~A1$9$k$+$I$&$+;n$7$F$$$?$@$1$k$H=u$+$j$^$9!#e(B

e$B1sF#$G$9!#e(B

2009e$BG/e(B11e$B7ne(B15e$BF|e(B21:21 _ wanabe [email protected]:

io.c e$B$Ne(B read_all() e$B$G!“JQ49$,I,MW$J>l9g$N=hM}$,>/$7=E$$$h$&$@$C$?$N$Ge(B
e$B%P%C%U%!$K>/$7N/$a$F$+$ie(B io_shift_cbuf() e$B$r8F$S=P$9$h$&$K$9$k%Q%C%A$r=q$-$^$7$?!#e(B
e$BN/$a$k%P%$%H?t$Oe(B io_shift_cbuf() e$BCf$K$”$C$?<0$r;29M$K$7$F$$$^$9!#e(B

e$B%Q%C%A$NFbMF$OM}2r$7$F$$$^$;$s$,e(B

e$B:GBg$GLse(B 2.8 e$BG\B.$r7WB,$7$^$7$?$,!"JQ49$,J#;($J>l9g$K$O:9$O=L$^$j!"e(B
e$B$^$?EvA3$J$,$iC;$$%U%!%$%k$G$O$[$H$s$I:9$O=P$^$;$s$G$7$?!#e(B

e$B$I$J$?$+$b$76=L#$r;}$C$F$$$?$@$1$?$i!“e(BWindows e$B$G$J$$!”$^$?$O==J,B.$$4D6-$Ge(B
e$BF1MM$KB.EY$,2~A1$9$k$+$I$&$+;n$7$F$$$?$@$1$k$H=u$+$j$^$9!#e(B

Debian e$B$G;n$7$F$_$^$7$?!#:GBge(B 2.8 e$BG$/$i$$$G$7$?!#e(B

$ ./ruby test.rb
user system total real
short r 0.310000 0.190000 0.500000 ( 0.489743)
short r:us-ascii:utf-8 0.760000 0.220000 0.980000 ( 0.984798)
short r:us-ascii:utf-16le 1.030000 0.270000 1.300000 ( 1.293715)
short rb 0.260000 0.220000 0.480000 ( 0.480045)
short rb:us-ascii:utf-8 0.730000 0.260000 0.990000 ( 1.005522)
short rb:us-ascii:utf-16le 1.050000 0.250000 1.300000 ( 1.311958)
long r 0.040000 0.020000 0.060000 ( 0.057341)
long r:us-ascii:utf-8 3.800000 0.020000 3.820000 ( 3.817237)
long r:us-ascii:utf-16le 7.840000 0.040000 7.880000 ( 7.956883)
long rb 0.020000 0.030000 0.050000 ( 0.052485)
long rb:us-ascii:utf-8 3.830000 0.030000 3.860000 ( 3.867369)
long rb:us-ascii:utf-16le 7.960000 0.020000 7.980000 ( 8.031679)

$ ./ruby.org test.rb
user system total real
short r 0.280000 0.190000 0.470000 ( 0.459421)
short r:us-ascii:utf-8 0.830000 0.190000 1.020000 ( 1.023667)
short r:us-ascii:utf-16le 1.160000 0.180000 1.340000 ( 1.349957)
short rb 0.250000 0.210000 0.460000 ( 0.454870)
short rb:us-ascii:utf-8 0.850000 0.170000 1.020000 ( 1.019038)
short rb:us-ascii:utf-16le 1.100000 0.240000 1.340000 ( 1.339989)
long r 0.020000 0.040000 0.060000 ( 0.053929)
long r:us-ascii:utf-8 11.730000 0.020000 11.750000 ( 11.753191)
long r:us-ascii:utf-16le 15.880000 0.040000 15.920000 ( 15.969792)
long rb 0.020000 0.020000 0.040000 ( 0.046306)
long rb:us-ascii:utf-8 11.020000 0.030000 11.050000 ( 11.038415)
long rb:us-ascii:utf-16le 15.400000 0.020000 15.420000 ( 15.425342)

$ ./ruby -v
ruby 1.9.2dev (2009-11-14 trunk 25768) [i686-linux]

e$B@.@%$G$9!#e(B

_ wanabe wrote:

io.c e$B$Ne(B read_all() e$B$G!“JQ49$,I,MW$J>l9g$N=hM}$,>/$7=E$$$h$&$@$C$?$N$Ge(B
e$B%P%C%U%!$K>/$7N/$a$F$+$ie(B io_shift_cbuf() e$B$r8F$S=P$9$h$&$K$9$k%Q%C%A$r=q$-$^$7$?!#e(B
e$BN/$a$k%P%$%H?t$Oe(B io_shift_cbuf() e$BCf$K$”$C$?<0$r;29M$K$7$F$$$^$9!#e(B

e$B$"!<!"$J$k$[$I!"e(BIO e$B$NBT$A$h$j$b!“e(Btranscode e$B$N8F$S=P$7e(B
(more_char e$B$N=hM}e(B ?)
e$B$NJ}$,CY$$$C$FOC$G$9$M!#e(B
e$B;W$&$KBEEv$J;XE&$G!”$&$A$G$bB.$/$J$j$^$7$?!#e(B

% ruby19.orig test.rb
user system total real
short r 0.335938 1.000000 1.335938 ( 1.328293)
short r:us-ascii:utf-8 1.210938 0.867188 2.078125 ( 2.082787)
short r:us-ascii:utf-16le 1.648438 1.007812 2.656250 ( 2.658761)
short rb 0.335938 0.812500 1.148438 ( 1.152713)
short rb:us-ascii:utf-8 1.289062 0.984375 2.273438 ( 2.304020)
short rb:us-ascii:utf-16le 1.531250 0.906250 2.437500 ( 2.445138)
long r 0.023438 0.101562 0.125000 ( 0.121755)
long r:us-ascii:utf-8 14.703125 0.101562 14.804688 ( 14.856436)
long r:us-ascii:utf-16le 24.406250 0.062500 24.468750 ( 24.553504)
long rb 0.031250 0.078125 0.109375 ( 0.106960)
long rb:us-ascii:utf-8 13.179688 0.039062 13.218750 ( 13.273193)
long rb:us-ascii:utf-16le 23.031250 0.062500 23.093750 ( 23.153640)

% ruby19 test.rb
user system total real
short r 0.328125 0.703125 1.031250 ( 1.031874)
short r:us-ascii:utf-8 1.226562 0.867188 2.093750 ( 2.121321)
short r:us-ascii:utf-16le 1.601562 0.742188 2.343750 ( 2.341433)
short rb 0.335938 0.695312 1.031250 ( 1.025165)
short rb:us-ascii:utf-8 1.171875 0.781250 1.953125 ( 1.958811)
short rb:us-ascii:utf-16le 1.578125 0.875000 2.453125 ( 2.512860)
long r 0.023438 0.093750 0.117188 ( 0.122622)
long r:us-ascii:utf-8 6.625000 0.046875 6.671875 ( 6.786332)
long r:us-ascii:utf-16le 13.507812 0.054688 13.562500 ( 13.699172)
long rb 0.015625 0.093750 0.109375 ( 0.108933)
long rb:us-ascii:utf-8 6.632812 0.046875 6.679688 ( 6.703453)
long rb:us-ascii:utf-16le 13.515625 0.046875 13.562500 ( 13.641543)

% ruby19 -v
ruby 1.9.2dev (2009-11-14 trunk 25768) [x86_64-freebsd8.0]

e$B$^$D$b$He(B e$B$f$-$R$m$G$9e(B

In message “Re: [ruby-dev:39699] Re: [Feature #2372] read_all() with
buffering”
on Mon, 16 Nov 2009 00:19:25 +0900, “NARUSE, Yui”
[email protected] writes:
|_ wanabe wrote:
|> io.c e$B$Ne(B read_all() e$B$G!“JQ49$,I,MW$J>l9g$N=hM}$,>/$7=E$$$h$&$@$C$?$N$Ge(B
|> e$B%P%C%U%!$K>/$7N/$a$F$+$ie(B io_shift_cbuf() e$B$r8F$S=P$9$h$&$K$9$k%Q%C%A$r=q$-$^$7$?!#e(B
|> e$BN/$a$k%P%$%H?t$Oe(B io_shift_cbuf() e$BCf$K$”$C$?<0$r;29M$K$7$F$$$^$9!#e(B
|
|e$B$“!<!”$J$k$[$I!"e(BIO e$B$NBT$A$h$j$b!“e(Btranscode e$B$N8F$S=P$7e(B (more_char e$B$N=hM}e(B ?)
|e$B$NJ}$,CY$$$C$FOC$G$9$M!#e(B
|e$B;W$&$KBEEv$J;XE&$G!”$&$A$G$bB.$/$J$j$^$7$?!#e(B

e$B%3%_%C%H$7$F$$$?$@$1$^$;$s$+!#e(Bwanabee$B$5$s$,D>@$5$l$k$N$,NI$$e(B
e$B$H;W$$$^$9!#e(B

e$B%A%1%C%He(B #2372 e$B$,99?7$5$l$^$7$?!#e(B (by _ wanabe)

e$B%9%F!<%?%9e(B Opene$B$+$ie(BClosede$B$KJQ99e(B

e$B$G$O%3%_%C%H$5$;$F$$$?$@$-$^$9!#e(B
e$B7WB,$K$*IU$-9g$$D:$-$"$j$,$H$&$4$6$$$^$7$?!#e(B

http://redmine.ruby-lang.org/issues/show/2372

e$B%o%J%Y$G$9!#e(B

2009/11/16, Tanaka A. [email protected]:

In article [email protected],
_ wanabe [email protected] writes:

e$B%A%1%C%He(B #2372 e$B$,99?7$5$l$^$7$?!#e(B (by _ wanabe)

e$B%9%F!<%?%9e(B Opene$B$+$ie(BClosede$B$KJQ99e(B

e$B$G$O%3%_%C%H$5$;$F$$$?$@$-$^$9!#e(B

e$B$=$N%3%%C%H0J9!"0J2<$N$h$&$J<:GT$,H/@8$7$F$$$^$9!#e(B

e$B$9$$^$;$s!“$4;XE&$”$j$,$H$&$4$6$$$^$9!#e(B
e$B8+Mn$H$7$F$$$^$7$?!#<h$j5^$.!"e(Brb_protect
e$B$G$/$/$k$h$&$K$7$F:FEY%3%
%C%H$7$^$7$?!#e(B

In article [email protected],
_ wanabe [email protected] writes:

e$B%A%1%C%He(B #2372 e$B$,99?7$5$l$^$7$?!#e(B (by _ wanabe)

e$B%9%F!<%?%9e(B Opene$B$+$ie(BClosede$B$KJQ99e(B

e$B$G$O%3%_%C%H$5$;$F$$$?$@$-$^$9!#e(B

e$B$=$N%3%%C%H0J9!"0J2<$N$h$&$J<:GT$,H/@8$7$F$$$^$9!#e(B

% ./ruby -v test/ruby/test_io_m17n.rb
ruby 1.9.2dev (2009-11-16 trunk 25789) [i686-linux]
Loaded suite test/ruby/test_io_m17n
Started
…F…F…F…
Finished in 1.054658 seconds.

  1. Failure:
    test_invalid_r(TestIO_M17N) [test/ruby/test_io_m17n.rb:1565]:
    <“b”> expected but was
    <“ab”>.

  2. Failure:
    test_read_all_invalid(TestIO_M17N) [test/ruby/test_io_m17n.rb:714]:
    <“e$B$&$(e(B”> expected but was
    <“e$B$”$$$&$(e(B">.

  3. Failure:
    test_undef_r(TestIO_M17N) [test/ruby/test_io_m17n.rb:1585]:
    <“b”> expected but was
    <“ab”>.

112 tests, 528 assertions, 3 failures, 0 errors, 0 skips

In article
[email protected],
wanabe [email protected] writes:

e$B$9$$^$;$s!“$4;XE&$”$j$,$H$&$4$6$$$^$9!#e(B
e$B8+Mn$H$7$F$$$^$7$?!#<h$j5^$.!"e(Brb_protect e$B$G$/$/$k$h$&$K$7$F:FEY%3%
%C%H$7$^$7$?!#e(B

e$B;W$&$s$G$9$,!"%o%J%Y$5$s$,$d$j$?$+$C$?$N$O$`$7$me(B
ECONV_AFTER_OUTPUT e$B$r30$7$?$+$C$?$H$$$&$3$H$J$s$8$c$J$$$G$9e(B
e$B$+$M$'!#e(B

e$B$3$&$9$k$H$3$A$i$G$OB.$/$J$k46$8$G$9$,!"$=$A$i$G$O$I$&$G$7$ge(B
e$B$&!#e(B

% svn diff --diff-cmd diff -x ‘-u -p’
Index: io.c

— io.c (revision 25821)
+++ io.c (working copy)
@@ -1582,17 +1582,22 @@ make_readconv(rb_io_t *fptr, int size)
}
}

-static int
-more_char(rb_io_t *fptr)
+#define MORE_CHAR_CBUF_FULL Qtrue
+#define MORE_CHAR_FINISHED Qnil
+static VALUE
+fill_cbuf(rb_io_t *fptr, int ec_flags)
{
const unsigned char *ss, *sp, *se;
unsigned char *ds, *dp, *de;
rb_econv_result_t res;
int putbackable;
int cbuf_len0;

  • VALUE exc;

  • ec_flags |= ECONV_PARTIAL_INPUT;

    if (fptr->cbuf_len == fptr->cbuf_capa)

  •    return 0; /* cbuf full */
    
  •    return MORE_CHAR_CBUF_FULL; /* cbuf full */
    
    if (fptr->cbuf_len == 0)
    fptr->cbuf_off = 0;
    else if (fptr->cbuf_off + fptr->cbuf_len == fptr->cbuf_capa) {
    @@ -1607,7 +1612,7 @@ more_char(rb_io_t *fptr)
    se = sp + fptr->rbuf_len;
    ds = dp = (unsigned char *)fptr->cbuf + fptr->cbuf_off +
    fptr->cbuf_len;
    de = (unsigned char *)fptr->cbuf + fptr->cbuf_capa;
  •    res = rb_econv_convert(fptr->readconv, &sp, se, &dp, de, 
    

ECONV_PARTIAL_INPUT|ECONV_AFTER_OUTPUT);

  •    res = rb_econv_convert(fptr->readconv, &sp, se, &dp, de, 
    

ec_flags);
fptr->rbuf_off += (int)(sp - ss);
fptr->rbuf_len -= (int)(sp - ss);
fptr->cbuf_len += (int)(dp - ds);
@@ -1619,13 +1624,15 @@ more_char(rb_io_t *fptr)
fptr->rbuf_len += putbackable;
}

  •    rb_econv_check_error(fptr->readconv);
    
  •    exc = rb_econv_make_exception(fptr->readconv);
    
  •    if (!NIL_P(exc))
    
  •        return exc;
    
       if (cbuf_len0 != fptr->cbuf_len)
    
  •        return 0;
    
  •        return MORE_CHAR_CBUF_FULL;
    
       if (res == econv_finished) {
    
  •        return -1;
    
  •        return MORE_CHAR_FINISHED;
    

    }

       if (res == econv_source_buffer_empty) {
    

@@ -1645,6 +1652,16 @@ more_char(rb_io_t *fptr)
}

static VALUE
+more_char(rb_io_t *fptr)
+{

  • VALUE v;
  • v = fill_cbuf(fptr, ECONV_AFTER_OUTPUT);
  • if (v != MORE_CHAR_CBUF_FULL && v != MORE_CHAR_FINISHED)
  •    rb_exc_raise(v);
    
  • return v;
    +}

+static VALUE
io_shift_cbuf(rb_io_t *fptr, int len, VALUE *strp)
{
VALUE str;
@@ -1665,7 +1682,7 @@ io_shift_cbuf(rb_io_t fptr, int len, VA
/
xxx: set coderange */
if (fptr->cbuf_len == 0)
fptr->cbuf_off = 0;

  • if (fptr->cbuf_off < fptr->cbuf_capa/2) {
  • else if (fptr->cbuf_capa/2 < fptr->cbuf_off) {
    memmove(fptr->cbuf, fptr->cbuf+fptr->cbuf_off, fptr->cbuf_len);
    fptr->cbuf_off = 0;
    }
    @@ -1686,21 +1703,19 @@ read_all(rb_io_t *fptr, long siz, VALUE
    else rb_str_set_len(str, 0);
    make_readconv(fptr, 0);
    while (1) {
  •  int fin, state = 0;
    
  •        if (fptr->cbuf_len > fptr->cbuf_capa / 2) {
    
  •        VALUE v;
    
  •        if (fptr->cbuf_len) {
               io_shift_cbuf(fptr, fptr->cbuf_len, &str);
           }
    
  •  fin = rb_protect((VALUE (*)(VALUE))more_char, (VALUE)fptr, 
    

&state);

  •  if (fin == -1 || state != 0) {
    
  • if (fptr->cbuf_len) {
  •    io_shift_cbuf(fptr, fptr->cbuf_len, &str);
    
  • }
  • if (state != 0) {
  •    rb_jump_tag(state);
    
  • }
  • clear_readconv(fptr);
  •        v = fill_cbuf(fptr, 0);
    
  •        if (v != MORE_CHAR_CBUF_FULL && v != MORE_CHAR_FINISHED) {
    
  •            if (fptr->cbuf_len) {
    
  •                io_shift_cbuf(fptr, fptr->cbuf_len, &str);
    
  •            }
    
  •            rb_exc_raise(v);
    
  •        }
    
  •        if (v == MORE_CHAR_FINISHED) {
    
  •            clear_readconv(fptr);
               return io_enc_str(str, fptr);
           }
       }
    

@@ -2181,7 +2196,7 @@ appendline(rb_io_t *fptr, int delim, VAL
return (unsigned
char)RSTRING_PTR(str)[RSTRING_LEN(str)-1];
}
}

  •    } while (more_char(fptr) != -1);
    
  •    } while (more_char(fptr) != MORE_CHAR_FINISHED);
       clear_readconv(fptr);
       *lp = limit;
       return EOF;
    

@@ -2695,7 +2710,8 @@ io_getc(rb_io_t *fptr, rb_encoding *enc)
}
}

  •        if (more_char(fptr) == -1) {
    
  •        if (more_char(fptr) == MORE_CHAR_FINISHED) {
    
  •            clear_readconv(fptr);
               if (fptr->cbuf_len == 0)
                   return Qnil;
               /* return an incomplete character just before EOF */
    

@@ -2830,8 +2846,8 @@ rb_io_each_codepoint(VALUE io)
rb_raise(rb_eIOError, “too long character”);
}
}

  • if (more_char(fptr) == -1) {
  •    clear_readconv(fptr);
    
  • if (more_char(fptr) == MORE_CHAR_FINISHED) {
  •                clear_readconv(fptr);
       /* ignore an incomplete character before EOF */
       return io;
    
    }

e$B%o%J%Y$G$9!#e(B

2009/11/18, Tanaka A. [email protected]:

e$B;W$&$s$G$9$,!"%o%J%Y$5$s$,$d$j$?$+$C$?$N$O$`$7$me(B
ECONV_AFTER_OUTPUT e$B$r30$7$?$+$C$?$H$$$&$3$H$J$s$8$c$J$$$G$9e(B
e$B$+$M$'!#e(B

e$B$3$&$9$k$H$3$A$i$G$OB.$/$J$k46$8$G$9$,!"$=$A$i$G$O$I$&$G$7$ge(B
e$B$&!#e(B

e$B$"$j$,$H$&$4$6$$$^$9!#3JCJ$KB.$/$J$j$^$7$?!#e(B

short r 3.093000 6.422000 9.515000 ( 9.734375)
short r:us-ascii:utf-8 4.469000 7.031000 11.500000 ( 12.171875)
short r:us-ascii:utf-16le 5.688000 7.156000 12.844000 ( 13.375000)
short rb 1.906000 5.875000 7.781000 ( 8.031250)
short rb:us-ascii:utf-8 3.906000 6.344000 10.250000 ( 10.609375)
short rb:us-ascii:utf-16le 4.297000 7.047000 11.344000 ( 11.937500)
long r 1.969000 0.219000 2.188000 ( 2.218750)
long r:us-ascii:utf-8 13.407000 0.281000 13.688000 ( 14.093750)
long r:us-ascii:utf-16le 20.515000 0.344000 20.859000 ( 21.515625)
long rb 0.078000 0.203000 0.281000 ( 0.328125)
long rb:us-ascii:utf-8 1.625000 0.297000 1.922000 ( 2.031250)
long rb:us-ascii:utf-16le 12.891000 0.531000 13.422000 ( 13.968750)

e$B$^$D$b$He(B e$B$f$-$R$m$G$9e(B

In message “Re: [ruby-dev:39708] Re: Feature #2372 read_all()
with buffering”
on Wed, 18 Nov 2009 07:16:32 +0900, Tanaka A. [email protected]
writes:

|e$B;W$&$s$G$9$,!“%o%J%Y$5$s$,$d$j$?$+$C$?$N$O$`$7$me(B
|ECONV_AFTER_OUTPUT e$B$r30$7$?$+$C$?$H$$$&$3$H$J$s$8$c$J$$$G$9e(B
|e$B$+$M$'!#e(B
|
|e$B$3$&$9$k$H$3$A$i$G$OB.$/$J$k46$8$G$9$,!”$=$A$i$G$O$I$&$G$7$ge(B
|e$B$&!#e(B

e$B%3%_%C%H$7$F$/$@$5$$!#e(B