Hello group,
Ruby implements copy-on-write for strings, so you can do stuff like
this very cheaply:
str = 0.chr * (2**24) # 16MiB allocated
str[100…-1] # this costs only a small amount of memory
How come this optimization does not apply in this case?:
str[100…-2] # this costs around 16MiB bytes of memory
As a side effect, if using regexps on a large string, the pre-match
and post-match variables behave differently:
s = 0.chr * (223) + “Hello” + 0.chr * (223) # About 16MiB
allocated (after GC)
s.scan(/Hello/) { |m| p m } # This is free
p $’.size # This is free
p $`.size # This costs another 8MiB.
Any insights?
Lars
On 05.05.2008 18:07, ts wrote:
p $`.size # This costs another 8MiB.
same reason here.
Interesting. Do you also happen to know why not an additional field is
used that stores the length? Is the reason maybe usage of C library
string functions that work on zero terminated strings?
Cheers
robert
Lars C. wrote:
Well, it’s best if you look at rb_str_substr() in string.c
str[100…-1] # this costs only a small amount of memory
ruby just need to adjust the pointer and the length in the new
object
str[100…-2] # this costs around 16MiB bytes of memory
one character is missing from the previous string, if it do the
same thing than previously then it must
- adjust the pointer
- adjust the length
- add \0 at the end
This mean that fatally it has modified the string, this is why it
duplicate.
p $’.size # This is free
p $`.size # This costs another 8MiB.
same reason here.
Guy Decoux
On 05.05.2008 18:33, ts wrote:
Robert K. wrote:
Interesting. Do you also happen to know why not an additional field is
used that stores the length?
I’ve not understood : it has a field which give it the length of
the string, for example with
Ah, ok. This happens when one is too lazy to look into the source.
Somehow I had assumed that the length was not stored because you made
the point that the \0 could not be inserted without altering the
original. I concluded, there is no length.
str = ‘0’ * 200
str[100 … -1]
the first object (in str) will have 200 for its length
the field length in the new object will have the value 100
Is the reason maybe usage of C library
string functions that work on zero terminated strings?
only matz know this
Well, maybe he’ll stop by and enlighten us.
Kind regards
robert
Robert K. wrote:
Interesting. Do you also happen to know why not an additional field is
used that stores the length?
I’ve not understood : it has a field which give it the length of
the string, for example with
str = ‘0’ * 200
str[100 … -1]
the first object (in str) will have 200 for its length
the field length in the new object will have the value 100
Is the reason maybe usage of C library
string functions that work on zero terminated strings?
only matz know this
Guy Decoux