On 7/26/2011 02:28, Robert K. wrote:
Jeremy B. wrote in post #1013041:
Explaining the difference between symbols and strings got me wondering
why string literals aren’t treated similarly to symbols by the
interpreter. To be clear it seems that string literals could be
allocated 1 time each, where duplicates simply reference the first
created instance.
Actually this is what happens with COW underneath. Still you get
multiple instances which is correct and needed (see below).
Thanks for your response, Robert.
Yes, this makes perfect sense because a string literal is mutable like
any other string. Each string literal could be modified at any time, so
the safe and possibly only solution is to treat each on as if it will
be modified and give it its own instance.
The trick would be to make each reference to the literal a dup of the
singular, hidden instance. With the COW semantics that dup’d strings
have (or should have), this should be a more memory efficient way to
deal with programs that have many instances of a string literal.
This is what happens. Note, it’s not a dup of the reference but a dup
of the instance (i.e. like CONSTANT_STRING.dup was called).
Right. Thanks for clarifying. It’s good to know that the string data
itself isn’t being duplicated, even if it may typically be rare for
anything aside from small strings to be used repeatedly at literals.
1.upto(2) do
string = “example”
puts string
end
Unless I misunderstand the current implementation in MRI, a loop such as
the one above would needlessly create a new instance of “good” for every
pass through the loop.
You probably meant “example” instead of “good”.
Yes. There was a little of my earlier explanation message leaking
through. Late night emails…
It will behave like this
irb(main):007:0> a=[]
=> []
irb(main):008:0> s=“foo”
=> “foo”
irb(main):009:0> 3.times {|i| a << (s << i.to_s)}
=> 3
irb(main):010:0> a
=> [“foo012”, “foo012”, “foo012”]
My actual suggestion, whether I conveyed it clearly or not, was to work
with dups. However, even dups require allocation, so we wouldn’t
save much at all in the end. I should have slept on my suggestion a bit
before sending it out.
In my example above, though, the dup is needless because the string is
not modified, and that was the point. That code could safely be trusted
to use the original string literal instance, but handling this in the
general case where the literal might be modified would probably be
non-trivial and maybe even impossible without some cheating by the
interpreter.
anyone knows what it is, please point it out. Maybe MRI already does
str_make_independent
Thanks for the pointer here. I’ll try to set aside some time to study
this code.
I’m trying to think of a way to allow the interpreter to cheat with
regard to string literals such that until the program attempts to modify
the literal or otherwise pierce the veil (calling object_id or similar)
the program actually uses the literal’s object instance directly. This
would allow the interpreter to avoid allocating extra string objects for
literals until they actually need to be modified or attributes such as
the object identity need to be known. I’m not sure this is possible or
would even yield any performance benefits. It’s more of a curiosity
now.
Thanks again for your help.
-Jeremy