I suggest you look at all the research done on reference counting
algorithms
versus sweep ones. Most if not all research shows that reference
counting is slower
and more prone to bugs than modern techniques.
Just throwing this thought out for public feedback. I noticed that some
other scripting languages use Reference counting. Ok just Python. Here
are its reasons (for feedback).
==Begin quote
Why doesn’t Python use a more traditional garbage collection scheme? For
one thing, this is not a C standard feature and hence it’s not portable.
(Yes, we know about the Boehm GC library. It has bits of assembler code
for most common platforms, not for all of them, and although it is
mostly transparent, it isn’t completely transparent; patches are
required to get Python to work with it.)
Traditional GC also becomes a problem when Python is embedded into other
applications. While in a standalone Python it’s fine to replace the
standard malloc() and free() with versions provided by the GC library,
an application embedding Python may want to have its own substitute for
malloc() and free(), and may not want Python’s. Right now, Python works
with anything that implements malloc() and free() properly.
Note that on systems using traditional GC, code that uses external
resources without explicitly releasing them may run out of resources
before the GC kicks in. Consider this example:
class Resource:
def init(self, name):
self.handle = allocate_resource(name)
def del(self):
if self.handle:
self.close()
def close(self):
release_resource(self.handle)
self.handle = None
…
for name in big_list:
x = Resource(name)
do something with x
In current releases of CPython, each new assignment to x inside the loop
will release the previously allocated resource. Using GC, this is not
guaranteed.
==End Quote
Oh except that in current Ruby it is still guaranteed to free (in its
own klunky way).
Roger, you REALLY need to read the literature on GC which has been
accumulating for the past 50 years.
Reference counting is pretty much an obsolete approach to GC. It was
probably the first approach taken for lisp back in the 1950s. Other
language implementations usually started with reference counting (e.g.
the first Smalltalk).
It’s main advantage is that it’s easy to understand. On the other hand
it incurs a large overhead since counts need to be
incremented/decremented on every assignment. It can’t detect circular
lists of dead objects. In early Smalltalk programs when reference
counting was used, you needed to explicitly nil out references to
break such chains. There’s also the issue of the overhead for storing
the reference count, and how many bits to allocate. Most reference
counting implementations punt when the reference count overflows, they
treat a ‘full’ count as an infinite count and no longer decrement it,
leading to more uncollectable objects.
Mark and sweep, such as is used in the Ruby 1.8 implementation quickly
replaced reference counting as the simplest GC considered for real
use.
This is somewhat confusing for me, because it seems that using a mark
and sweep generational style GC slows down Ruby, as per
On my own I tried integrating Boehm’s GC as a drop in replacement and it
seemed to cause a serious slowdown [possibly because I didn’t integrate
it right ]
Also macruby reports that their runtime is slightly slower, I’ll make a
big assumption–because it uses a conservative GC so it can’t allocate
as cheaply [however their garbage collection is faster, and runs in a
different thread so doesn’t cause a program halt–so not all bad there]
Performance-wise, Python uses reference counting and seems to have
‘acceptable’ performance. They avoid the circular problems by running a
full mark and sweep every so often.
I suppose I’m just naive, but it doesn’t seem clear to me which is the
‘best’ GC style, from the above observations. There appears to be no
clear answer.
Cheers!
-R
Which particular GC approach is best for Ruby is subject to some study.
Badly-written garbage collection slows things down. This is no
surprise. Badly written string handling or maths handling slows things
down too.
On my own I tried integrating Boehm’s GC as a drop in replacement and it
seemed to cause a serious slowdown [possibly because I didn’t integrate
it right ]
Whereas I snuck Boehm’s GC into an employer’s product (to find memory
leaks, initially, but later just kept it in as a memory manager) and
nobody noticed it until long after it was released. Nobody noticed
because it actually improved performance in that persistent memory leaks
vanished and as a result swapping was reduced (among other
memory-related bottlenecks).
Of course then the retards removed the GC because “garbage collection is
slow” and the next release after that died a horrible death of a million
paper cuts (read: memory leaks).
I suppose I’m just naive, but it doesn’t seem clear to me which is the
‘best’ GC style, from the above observations. There appears to be no
clear answer.
And from here true wisdom grows. Pick up Richard J.’ book (ISBN
0-471-94148-4) or drop by his web page
(the Garbage Collection Page) and read. There is
no single silver bullet for memory management. You have to pick and
choose your algorithms and implementations based on your memory use
profile. The reason proper garbage collection has the reputation of
being slow is because the people who put it into their programs rarely
understand the domain, and the reason they rarely understand the domain
is because most people dismiss GC as slow which leads into a tight
spiral of memory-managed death.