Symbol garbage collection

I’ve always wondered why Ruby didn’t do garbage collection of symbols,
when Lisp implementations here and there did so, and in fact they are
essential in macro programming because Lisp programs tend to create a
lot of unique symbols at runtime, especially in macro code, with such
contrivances as gensym in Scheme and Common Lisp and uniq in Arc. Once
upon a time I studied how Ruby implemented symbols and while this
representation made symbol garbage collection difficult, it did not by
any means make it impossible. In addition to my work on Ruby I am also
writing an interpreter for Arc in C, and I made it use just about the
same representation as Ruby 1.8 did for symbols. I also devised an
algorithm to do symbol garbage collection with that same
representation.

I see no reason why the same approach will not work with Ruby as well.

On Feb 9, 2012, at 23:28 , Dido S. wrote:

algorithm to do symbol garbage collection with that same
representation.

I see no reason why the same approach will not work with Ruby as well.


普通じゃないのが当然なら答える私は何ができる?
普通でも普通じゃなくて感じるまま感じることだけをするよ!
http://stormwyrm.blogspot.com

This would be better sent to ruby-core@ or, if you speak/write japanese,
even better sent to ruby-dev@

On Fri, Feb 10, 2012 at 8:28 AM, Dido S. [email protected]
wrote:

I’ve always wondered why Ruby didn’t do garbage collection of symbols,
when Lisp implementations here and there did so, and in fact they are
essential in macro programming because Lisp programs tend to create a
lot of unique symbols at runtime, especially in macro code, with such
contrivances as gensym in Scheme and Common Lisp and uniq in Arc.

In Ruby the idea is rather to only use a fixed amount of Symbols in
the program and use Strings for everything that changes often.

I see no reason why the same approach will not work with Ruby as well.

Using Symbol in Ruby the way mentioned above avoids allocation and GC
overhead giving an overall more efficient program. Ruby and Lisp are
sufficiently different languages that I would not automatically
subscribe to the idea that what works good in one language would also
work good in the other - or is desirable to have at all.

Kind regards

robert

On Fri, Feb 10, 2012 at 11:07 AM, Robert K.
[email protected] wrote:

On Fri, Feb 10, 2012 at 8:28 AM, Dido S. [email protected] wrote:

I’ve always wondered why Ruby didn’t do garbage collection of symbols,
when Lisp implementations here and there did so, and in fact they are
essential in macro programming because Lisp programs tend to create a
lot of unique symbols at runtime, especially in macro code, with such
contrivances as gensym in Scheme and Common Lisp and uniq in Arc.

In Ruby the idea is rather to only use a fixed amount of Symbols in
the program and use Strings for everything that changes often.

I have a question about this rule of thumb: did this rule came to
existence precisely due to the fact that Symbols are not GCed, or was
it the other way around?
Maybe if Symbols’ garbage collection wasn’t a problem, that rule about
using them just in a limited and fixed amount would have never
existed.

Just asking…

Jesus.

2012/2/10 Jess Gabriel y Galn [email protected]:

the program and use Strings for everything that changes often.

I have a question about this rule of thumb: did this rule came to
existence precisely due to the fact that Symbols are not GCed, or was
it the other way around?

I can’t speak for Matz but from what I have gathered it was his very
conscious design decision to have Symbols not GC’ed.

Maybe if Symbols’ garbage collection wasn’t a problem, that rule about
using them just in a limited and fixed amount would have never
existed.

But without this distinction there is really no point in having String
AND Symbol any more. A Symbol then would merely be a String which is
frozen by default. I prefer the current situation because it makes
much more sense in my opinion. With a symbol we denote a fixed
identifier which is used in multiple locations of an application and
with a String we denote arbitrary text content.

Kind regards

robert

Hi,

Symbol GC might cause two problems: (1) GC throughput decreased. (2)
possible incompatibility.

Most programs do not consume many symbols that need to be collected.
Symbol GC we need to scan more values, that introduce more scan time.

Besides that, some programs convert symbols and integers back and
force. These programs might not work if symbol GC introduced.

I don’t say I am totally against symbol GC, but it’s more
controversial than you might think at first glance.

          matz.

Hi,
Ruby store all object, method and variable names as a symbols. That’s
why symbol GC might be dangerous.
The only thing i’m not clear about is the rb_gc_mark_symbols method that
is implemented in parse.y and is called for every GC round.
Can someone explain what’s its purpose?

On Fri, Feb 10, 2012 at 1:48 PM, Robert K.
[email protected] wrote:

In Ruby the idea is rather to only use a fixed amount of Symbols in
using them just in a limited and fixed amount would have never
existed.

But without this distinction there is really no point in having String
AND Symbol any more. A Symbol then would merely be a String which is
frozen by default. I prefer the current situation because it makes
much more sense in my opinion. With a symbol we denote a fixed
identifier which is used in multiple locations of an application and
with a String we denote arbitrary text content.

Makes sense, thanks.

Jesus.

Hi,

In message “Re: Symbol garbage collection”
on Fri, 10 Feb 2012 23:37:06 +0900, Sigurd [email protected]
writes:

|Hi,
|Ruby store all object, method and variable names as a symbols. That’s why symbol
GC might be dangerous.
|The only thing i’m not clear about is the rb_gc_mark_symbols method that is
implemented in parse.y and is called for every GC round.
|Can someone explain what’s its purpose?

In 1.9, symbol names are stored as Ruby strings, so that they have to
be marked to avoid reclaiming.

          matz.

Symbols currently have the nice property of being guaranteed (at least
I hope this is right) of always being the same object (due to being
immutable), allowing a simple comparison of address to see if there
equal and using very little memory. Aren’t symbols also kept separate
to the main heap, which the gc owns/runs?

Other then space, what could be gained from gcing symbols.

On Fri, Feb 17, 2012 at 4:10 AM, Ronnie Collinson
[email protected] wrote:

Symbols currently have the nice property of being guaranteed (at least
I hope this is right) of always being the same object (due to being
immutable),

Well, their Symbol state is immutable but you can nevertheless change
them:

irb(main):001:0> s = :foo
=> :foo
irb(main):002:0> s.frozen?
=> false
irb(main):003:0> s.instance_variables
=> []
irb(main):004:0> s.instance_variable_set ‘@x’, 123
=> 123
irb(main):005:0> s.instance_variable_get ‘@x
=> 123
irb(main):006:0> s.instance_variables
=> [:@x]
irb(main):007:0> :foo.instance_variables
=> [:@x]

allowing a simple comparison of address to see if there
equal and using very little memory.

Actually I don’t think there is much memory saving compared to String.

Aren’t symbols also kept separate
to the main heap, which the gc owns/runs?

Matz said:

M> In 1.9, symbol names are stored as Ruby strings, so that they have to
M> be marked to avoid reclaiming.

So they are not kept separate in 1.9.*.

Other then space, what could be gained from gcing symbols.

Space is the only thing that can be gained from GCint any object.

Cheers

robert

On Feb 17, 2012, at 11:58 AM, Ronnie Collinson wrote:

But two equal strings can be two separate objects (“a” == “a” could be
false),

It always be true. In a sense of how they are stored - yes, they can be
stored in different places, but in terms of ruby code that strings
always would be equal.
Also string === method is also comparing strings by their content, not
by are they the same object

a = “a”
b = “b”
a.object_id => 2181707380
b.object_id => 2181676040
a === b => true
a == b => true

But two equal strings can be two separate objects (“a” == “a” could be
false), two equal symbols will be (including your example, all
variables are still pointing to your new/hacked symbol ) the same
object, its not how the data is stored, saving memory but how it can
be a single instance for the lifetime of the process.

If space is the only gain, then gcing symbols would remove some other
nice properties such as single instance, Id rather pay a small bit of
space for nice properties.

When/If the symbols were in a separate area in memory to the gc’s
heap, then the gc would never waste time looking at the symbols, would
that make a measurable/worthwhile improvement to the gc speed?