On Dec 7, 3:29 am, Jano S. [email protected] wrote:
I’d assume the former saves you a bunch of allocations when looping
through a file
(I assume the buffer is reused instead of allocating a new one for
each iteration).
I’m not the smartest C programmer (or the smartest anything
programmer), but I’m not seeing any optimization in the actual C code.
Please correct me if I’m wrong.
First, io_read() is the function called in the backend from IO#read.
Te relevant lines are:
====
rb_scan_args(argc, argv, “02”, &length, &str);
if (NIL_P(length)) {
if (!NIL_P(str)) StringValue(str);
GetOpenFile(io, fptr);
rb_io_check_readable(fptr);
return read_all(fptr, remain_size(fptr), str);
}
len = NUM2LONG(length);
if (len < 0) {
rb_raise(rb_eArgError, “negative length %ld given”, len);
}
if (NIL_P(str)) {
str = rb_tainted_str_new(0, len);
}
else {
StringValue(str);
rb_str_modify(str);
rb_str_resize(str,len);
}
So we see that we get a new string from rb_tainted_str_new if buffer
is is not passed in to IO#read; otherwise str is used and we call
StringValue on it.
So what is StringValue? A macro defined in ruby.h:
====
#define StringValue(v) rb_string_value(&(v))
And what is rb_string_value()? A function from string.c:
====
static char *null_str = “”;
VALUE
rb_string_value(ptr)
volatile VALUE *ptr;
{
VALUE s = *ptr;
if (TYPE(s) != T_STRING) {
s = rb_str_to_str(s);
*ptr = s;
}
if (!RSTRING(s)->ptr) {
FL_SET(s, ELTS_SHARED);
RSTRING(s)->ptr = null_str;
}
return s;
}
So if it’s not a string, we convert it to one, otherwise we zero it
out.
But the interesting lines are back up in io_read():
====
rb_str_modify(str);
rb_str_resize(str,len);
Now rb_str_modify() (string.c) is called with our zeroed string. And
it in turn calls str_make_independent():
====
static void
str_make_independent(str)
VALUE str;
{
char *ptr;
ptr = ALLOC_N(char, RSTRING(str)->len+1);
if (RSTRING(str)->ptr) {
memcpy(ptr, RSTRING(str)->ptr, RSTRING(str)->len);
}
ptr[RSTRING(str)->len] = 0;
RSTRING(str)->ptr = ptr;
RSTRING(str)->aux.capa = RSTRING(str)->len;
FL_UNSET(str, STR_NOCAPA);
}
And finally, rb_str_resize is called:
====
VALUE
rb_str_resize(str, len)
VALUE str;
long len;
{
if (len < 0) {
rb_raise(rb_eArgError, “negative string size (or size too big)”);
}
rb_str_modify(str);
if (len != RSTRING(str)->len) {
if (RSTRING(str)->len < len || RSTRING(str)->len - len > 1024) {
REALLOC_N(RSTRING(str)->ptr, char, len+1);
if (!FL_TEST(str, STR_NOCAPA)) {
RSTRING(str)->aux.capa = len;
}
}
RSTRING(str)->len = len;
RSTRING(str)->ptr[len] = ‘\0’; /* sentinel */
}
return str;
}
Now, like I said, I’m not the greatest C programmer…but I fail to
see how, if I’m reading the code above correctly, passing in a buffer
string to IO#read is any more optimal than creating a new string (even
when looping many times), since it appears to me to be doing the same
thing (compare str_new from string.c, which is what rb_tainted_str_new
calls).
Regards,
Jordan
References:
http://svn.ruby-lang.org/repos/ruby/branches/ruby_1_8/io.c
http://svn.ruby-lang.org/repos/ruby/branches/ruby_1_8/ruby.h
http://svn.ruby-lang.org/repos/ruby/branches/ruby_1_8/string.c