Robert K.:
2009/11/16 Shot (Piotr S.) [email protected]:
The Set class in MRI is implemented in pure Ruby (lib/set.rb). I vaguely
remember someone posting here a C (re)implementation of the class, but
I can’t remember the details and it’s hard to google it – all I could
find is the [ruby-talk:24465] thread.
Why do you need that? Do you encounter any performance issues
in the adapter code to Hash (which is implemented in C)?
My initial performance tests (with the very nice perftools.rb gem from
Aman G.) seem to suggest that I spend most of the time in (C-based)
Hash#each_key, mostly due to calls from Set#each – but it looks like
the Set#each calls themselves add a significant overhead. Thus, I was
thinking that maybe a Set class which does not do additional Ruby calls,
but is itself implemented in C, might be faster.
Also, I just surprisingly discovered that
if I want to have Set#pairs such that
Set[1,2,3,4].pairs.to_a # => [[1,2], [1,3], [1,4], [2,3], [2,4], [3,4]]
then doing
module Enumerable
def pairs
return combination 2 if respond_to? :combination
Enumerator.new do |yielder|
each_with_index do |a, i|
each_with_index do |b, j|
yielder.yield a, b if i < j
end
end
end
end
end
is actually much slower than the simple
module Enumerable
def pairs
combination 2
end
end
class Set
def pairs
to_a.pairs
end
end
which both surprises me a bit (I assumed the short lived Array creation
was expensive) and strenghtens my suspicions that the Ruby-based
Set class is rather slow and/or Array#combination’s Enumerator
initialisation is much faster (but, again, maybe because
Array#combination is written in C).
This, in turn, made me think that given that most of the core of my code
consists of operations on frozen Sets of Integers (usually Bignums),
then maybe a custom class for them (eventually most probably ported to
C or D) would make a lot of sense. As my C-fu is still somewhere between
rusty-beyond-all-hope and nonexistent, I’m looking for a Set in C so
that I can learn how it’s done – although I should probably just look
closely at hash.c, maybe with some ruby2c translation of set.rb to
speed-up my understanding of Ruby’s C interface.
Of course, the custom (immutable) class might also have the benefit of
caching the #pairs’ result, or at least its #to_a Array counterpart.
(For some reason I was also assuming that a custom IntSet class would
side-step the need for backporting r22308¹ to all my Ruby installs;
I then recalled that it fixes Bignum#hash, not Set#hash – but now
I again think that if I build my custom IntSet class and am very
vigilant about not using Bignums in regular Sets or as Hash keys,
then I might really end up not needing the backport any more. I guess
redefining Bignum#hash to raise an exception would make a good guard
against this bug and show me how many places actually trigger it at the
moment.)
¹ http://redmine.ruby-lang.org/repositories/diff/ruby-19?rev=22308
— Shot