Symbols and frozen strings

I just had a thought.

One of the problems with using strings as hash keys is that every time
you
refer to them, you create a throw-away garbage string:

params["id"]
        ^
        +-- temporary string, needs to be garbage collected

In Rails you have HashWithIndifferentAccess, but this actually isn’t any
better. Although you write params[:id], when executed the symbol is
converted to a string anyway.

In a Rails-like scenario, using symbols as the real keys within the hash
doesn’t work: the keys come from externally parsed data, which means (a)
they were strings originally, and (b) if you converted them to symbols
you’d
risk a symbol exhaustion attack.

So I thought, wouldn’t it be nice to have a half-way house: being able
to
converting a symbol to a string, in such a way that you always got the
same
(frozen) string object?

This turned out to be extremely easy:

class Symbol
def fring
@fring ||= to_s.freeze
end
end

irb(main):006:0> :foo.fring
=> “foo”
irb(main):007:0> :foo.fring.object_id
=> -605512686
irb(main):008:0> :foo.fring.object_id
=> -605512686
irb(main):009:0> :bar.fring
=> “bar”
irb(main):010:0> :bar.fring.object_id
=> -605543036
irb(main):011:0> :bar.fring.object_id
=> -605543036
irb(main):012:0> :bar.fring << “x”
TypeError: can’t modify frozen string
from (irb):12:in `<<’
from (irb):12
from :0

Is this a well-known approach, and/or it does it exist in any extension
library?

I suppose that an instance variable lookup isn’t necessarily faster than
always creating a temporary string with to_s and then garbage collecting
it
at some point later in time, but it feels like it ought to be :slight_smile:

However, since I’ve seen discussion about string modifiers like "…"u,
perhaps there’s scope for adding in-language support, e.g.

"..."f     - frozen string, same object ID each time it's executed

In that case, it might be more convenient the other way round:

“…” - frozen string literal, same object
“…“m - mutable (unfrozen) string literal, new objects
String.new(”…”) - another way of making a mutable string
“…”.dup - and another

That would break a lot of existing code, but it could be pragma-enabled.

Sorry if this ground has been covered before - it’s hard to keep up with
ruby-talk :slight_smile:

Regards,

Brian.

Hi,

At Thu, 6 Sep 2007 16:50:28 +0900,
Brian C. wrote in [ruby-talk:267857]:

So I thought, wouldn’t it be nice to have a half-way house: being able to
converting a symbol to a string, in such a way that you always got the same
(frozen) string object?

Rather, Symbol#to_s should return frozen String?

I suppose that an instance variable lookup isn’t necessarily faster than
always creating a temporary string with to_s and then garbage collecting it
at some point later in time, but it feels like it ought to be :slight_smile:

However, since I’ve seen discussion about string modifiers like "…"u,
perhaps there’s scope for adding in-language support, e.g.

"..."f     - frozen string, same object ID each time it's executed

What about "…"o like Regexp?

Rather, Symbol#to_s should return frozen String?

Yes, as long as it returns the same frozen string each time.

Hmm, this sounds like a good solution - it’s technically not
backwards-compatible, but I doubt that much code does a Symbol#to_s and
later mutates it.

What about "…"o like Regexp?

Sure, I don’t mind about the actual syntax.

Of course, you don’t even need to add ‘o’ to a Regexp in the case where
it
doesn’t contain any #{…} interpolation:

irb(main):001:0> RUBY_VERSION
=> “1.8.4”
irb(main):002:0> 3.times { puts /foo/.object_id }
-605554606
-605554606
-605554606

Regards,

Brian.

On Sep 6, 5:10 am, Brian C. [email protected] wrote:

Rather, Symbol#to_s should return frozen String?

Yes, as long as it returns the same frozen string each time.

Hmm, this sounds like a good solution - it’s technically not
backwards-compatible, but I doubt that much code does a Symbol#to_s and
later mutates it.

I’ve tried that. There are some places where it blows up Ruby. So
those would have to be rooted out first.

T.

2007/9/6, Trans [email protected]:

I’ve tried that. There are some places where it blows up Ruby. So
those would have to be rooted out first.

I always prefer less intrusive solutions. Why not do this:

SYMS = Hash.new {|h,sy| h[sy]=sy.to_s}

Then, wherever you need this, just do “SYMS[a_sym]” instead
“a_sym.to_s”. Added advantage, you can throw away or clear SYMS when
you know you do not need it any more thusly freeing up memory.

Kind regards

robert

Brian C. wrote:

I just had a thought.

One of the problems with using strings as hash keys is that every time you
refer to them, you create a throw-away garbage string:

params["id"]
        ^
        +-- temporary string, needs to be garbage collected

Setting aside the question of freezing, why can’t ruby share string data
for all strings generated from the same symbol? And in that case you
could do the following to avoid garbage:

  params[:id.to_s]

(Or ruby could even look up the literal “id” in the symbol table and do
this for you.)

This code shows some of the cases in which ruby does and does not share
string contents:

def show_vmsize
GC.start
puts ps -o vsz #$$[/\d+/]
end

s = “a”*1000
sym = s.to_sym

show_vmsize # 8712

ruby apparently does not share storage for strings derived

from the same symbol:

strs1 = (0…10_000).map do
sym.to_s
end

show_vmsize # 18488

ruby does share storage for string ops:

strs2 = (0…10_000).map do
s[0…-1]
end

show_vmsize # 18616

strs3 = (0…10_000).map do
s.dup
end

show_vmsize # 18616

Joel VanderWerf wrote:

Setting aside the question of freezing, why can’t ruby share string data
for all strings generated from the same symbol? And in that case you
could do the following to avoid garbage:

 params[:id.to_s]

Sorry… reduce garbage, not avoid it altogether, since there is still
the T_STRING, even though the data is reused. It would help more for
long strings than for short strings, because the T_DATA is smaller in
proportion.

The idea of a literal for a unique frozen string would reduce garbage
further, sharing the T_STRING as well as the data.

On Thu, 6 Sep 2007, Brian C. wrote:

I just had a thought.

One of the problems with using strings as hash keys is that every time you
refer to them, you create a throw-away garbage string:

params[“id”]
^
±- temporary string, needs to be garbage collected

Absolutely.

Whether you need to care about this, though, depends on how often your
code is building these throwaway strings, and on just how much you
really
need to neurotically performance tweak your code.

What I do to deal with this in code where I consider it important is to
use constants that contain frozen strings.

Id = ‘id’.freeze

params[Id]

Constant lookup isn’t the fastest thing in Ruby, but it’s faster than
the
combined load of creating the throwaway string object, and then garbage
collecting it.

Kirk H.

Joel VanderWerf wrote:

Sorry… reduce garbage, not avoid it altogether, since there is still
the T_STRING, even though the data is reused. It would help more for
long strings than for short strings, because the T_DATA is smaller in
proportion.

Sorry again… I don’t know where T_DATA came from. Should be T_STRING,
the constant-size overhead for a string object. Will stop posting until
caffeine hits.

Brian C. wrote:

irb(main):003:0> b << “bar”
=> “foobar”
irb(main):004:0> a
=> “foo”

This was what I was thinking of:

irb(main):001:0> a = :foo.to_s
=> “foo”
irb(main):002:0> b = a.dup
=> “foo”
irb(main):003:0> b << “bar”
=> “foobar”
irb(main):004:0> a
=> “foo”

Internally, a and b use the same storage, but copy-on-write prevents
aliasing.

Setting aside the question of freezing, why can’t ruby share string data
for all strings generated from the same symbol?

Because it could generate unexpected aliasing. The normal, expected
behaviour is no aliasing:

irb(main):001:0> a = :foo.to_s
=> “foo”
irb(main):002:0> b = :foo.to_s
=> “foo”
irb(main):003:0> b << “bar”
=> “foobar”
irb(main):004:0> a
=> “foo”

That’s why the string has to be frozen.

Regards,

Brian.

Joel VanderWerf wrote:

ruby does share storage for string ops:

strs2 = (0…10_000).map do
s[0…-1]
end

Hmm, we could use that property of strings…

class Symbol
alias _to_s to_s
def to_s
(@str || @str = _to_s)[0…-1]
end
end

Daniel

On Sep 6, 12:50 am, Brian C. [email protected] wrote:

I just had a thought.

“…”.dup - and another
Rubinius has a compiler extension that detects code in the form of

“name”.static

Inside the quotes can be any string, and the static method call is
removed,
but everytime the code is run, the same String object is returned.
This is
highly useful when using strings as hash keys, and avoids having to
put them
in constants that must be looked up later.

That would break a lot of existing code, but it could be pragma-enabled.

Sorry if this ground has been covered before - it’s hard to keep up with
ruby-talk :slight_smile:

Regards,

Brian.

  • Evan P.