Performance of ruby hashes

unni.tallman · October 18, 2006, 2:51pm

Anyway to enhance the performance of ruby hashes? My program uses
hashes heavily. So if i could enhance hash performance, or find a
fasster alternative to hashes, i could improve the overall performance
of my program. Gimme some suggestions.

unni.tallman · October 18, 2006, 3:52pm

On 18.10.2006 14:47, unni.tallman wrote:

Anyway to enhance the performance of ruby hashes? My program uses
hashes heavily. So if i could enhance hash performance, or find a
fasster alternative to hashes, i could improve the overall performance
of my program. Gimme some suggestions.

Hashes are pretty fast. Please state what actual performance problem
you have. Otherwise nobody will really be able to help you.

One reason for a slow hash can be a slow hash function or a hash
function that creates badly distributed values. That cannot be
attributed to the Hash implementation but to the implementation of your
hash keys.

robert

unni.tallman · October 18, 2006, 4:46pm

On 10/18/06, unni.tallman [email protected] wrote:

Anyway to enhance the performance of ruby hashes? My program uses
hashes heavily. So if i could enhance hash performance, or find a
fasster alternative to hashes, i could improve the overall performance
of my program. Gimme some suggestions.

Indexing them by symbols instead of strings helps a bit.
But hashes are so fast that it’s hard to even make a meaningful
benchmark. For example in the following meaningless benchmark
symbol-hashes are faster than string-hashes:

$ time ./meaningless_hash_benchmark.rb --symbols
real 0m16.117s
user 0m14.009s
sys 0m0.485s
$ time ./meaningless_hash_benchmark.rb
real 0m18.525s
user 0m16.306s
sys 0m0.475s
$ cat ./meaningless_hash_benchmark.rb
#!/usr/bin/ruby

N = 2000000

Lame

if ARGV.empty?
keys = File.readlines(“H”).map{|x| x.chomp}
else
keys = File.readlines(“H”).map{|x| x.chomp.to_sym}
end

h = {}
keys.each{|k| h[k] = rand }

k0 = keys[0]
k1 = keys[1]
k2 = keys[2]

A meaningless loop

N.times{|i|
h[k0] += h[k1]
h[k1] += h[k2]
h[k2] += h[k0]
}
$

15% is pretty impressive for a benchmark dominated by method calls.

That’s related to how hashes work. To get a_hash[a_key]

a_key is converted to a number (hashed)
the number is looked up in a_hash’s internal array
** we can get 0 answer - then a_key is definitely not there
** we can get 1 answer - then verify it by a_real_key == a_key
** we can get 2+ answers - then verify them all until match is found
(this case is rare, and high numbers here are even more rare)
So if we found a_real_key, such that a_real_key == a_key, then
return a_value_at_a_real_key
== can actually cost a lot. If you use Strings, their contents need
to be compared. If you use Symbols, you only check whether they’re the
same object. Each Symbol object is different, so this is enough.

unni.tallman · October 18, 2006, 5:10pm

On Wed, 18 Oct 2006, Tomasz W. wrote:

On 10/18/06, unni.tallman [email protected] wrote:

Anyway to enhance the performance of ruby hashes? My program uses
hashes heavily. So if i could enhance hash performance, or find a
fasster alternative to hashes, i could improve the overall performance
of my program. Gimme some suggestions.

Indexing them by symbols instead of strings helps a bit.
But hashes are so fast that it’s hard to even make a meaningful
benchmark. For example in the following meaningless benchmark
symbol-hashes are faster than string-hashes:

bad idea for big hashes - symbols are never gc’d and swapping is never
fast!

-a

unni.tallman · October 18, 2006, 5:45pm

bad idea for big hashes - symbols are never gc’d and swapping is never fast!

gc’d ?

James

unni.tallman · October 18, 2006, 10:45pm

On 10/18/06, [email protected] [email protected] wrote:

On Wed, 18 Oct 2006, Tomasz W. wrote:

On 10/18/06, unni.tallman [email protected] wrote:
Indexing them by symbols instead of strings helps a bit.
But hashes are so fast that it’s hard to even make a meaningful
benchmark. For example in the following meaningless benchmark
symbol-hashes are faster than string-hashes:

bad idea for big hashes - symbols are never gc’d and swapping is never fast!

It would have to be totally extreme for that to become a problem,
like with millions different keys in your program. Then you’re
absolutely correct, it can seriously hurt.

But in most normal cases, using symbols is a win.

And GC’ing them would require major changes to internal representation,
as they are direct values now, and as such not tracked at all.
Maybe in Ruby 3.

unni.tallman · October 19, 2006, 12:03am

On Thu, 19 Oct 2006, Tomasz W. wrote:

It would have to be totally extreme for that to become a problem,
like with millions different keys in your program. Then you’re
absolutely correct, it can seriously hurt.

more like hundreds of thousands on a 4gb box: i’ve notice a few times
with big
directory indexes, but yes - it must be large.

that being said, a web app which dynamically generates symbol keys and
which
runs for long periods ‘leaks’.

-a

unni.tallman · October 18, 2006, 10:51pm

On Oct 18, 2006, at 3:44 PM, Tomasz W. wrote:

Maybe in Ruby 3.

You’re behind on the news.

Symbols just became a String subclass in Ruby 1.9.

James Edward G. II

unni.tallman · October 19, 2006, 12:47am

On 10/18/06, James Edward G. II [email protected] wrote:

On Oct 18, 2006, at 3:44 PM, Tomasz W. wrote:

Maybe in Ruby 3.

You’re behind on the news.

Symbols just became a String subclass in Ruby 1.9.

Darn, and I’ve been using such an ancient snapshot of 1.9.

Still, garbage collecting Symbols probably won’t be doable before Ruby
3.
All over the source IDs (integers basically) are used as symbols, not
Symbol objects.

You can see it even within Ruby:

foo = :some_weird_symbol.to_i #=> 10897

:some_weird_symbol can be garbage collected now, right ?

Or not quite …

foo.to_sym #=> :some_weird_symbol

These numbers are used as method names and for all other internal
purposes. Symbols only pretend to be nice instances of Symbol class,
when nobody’s looking, they’re just numbers

unni.tallman · October 19, 2006, 12:51am

On 2006-10-18, James Edward G. II [email protected] wrote:

You’re behind on the news.

Symbols just became a String subclass in Ruby 1.9.

That doesn’t preclude them being immediate values; an object’s class
is independent of its internal TYPE. Check out “eigenclass - Changes
in Ruby 1.9” at
URL:http://eigenclass.org/hiki.rb?Changes+in+Ruby+1.9 and see the
discussion under the heading “Hash#_compare_by_identity and
Hash#compare_by_identity?”. This states that :c and :c are still
always the same object, whereas “c” and “c” are still in general
different. I think this shows that Symbols remain immediate. Unless
I’ve misunderstood something, of course.

Regards,

Jeremy H.

unni.tallman · October 19, 2006, 9:22am

unni.tallman wrote:

Anyway to enhance the performance of ruby hashes? My program uses
hashes heavily.

Many comments have been offered, but since no one knows what your code
looks
like, I want to offer the standard advice that assumes you are actually
having a hash speed problem. The advice is:

Make your hashes smaller by creating two or more tiers of hashes. One
way to
do this with strings as keys is to create one hash using the first (one,
two, whatever) characters as the key for level one, and the remaining
characters as the key into the second level. You should be able to
figure
out how to create a general method that implements this advice.

The basic idea is that hash speed degrades as the size of each
single-level
hash increases, and after a certain size has been reached, breaking the
hash into groups (layers if you prefer) improves performance.

So if i could enhance hash performance, or find a
fasster alternative to hashes, i could improve the overall performance
of my program. Gimme some suggestions.

Give us the details of the problem – the nature of the keys, the size
of
the data set, and so forth.

unni.tallman · October 19, 2006, 9:46am

On 19.10.2006 09:18, Paul L. wrote:

The basic idea is that hash speed degrades as the size of each single-level
hash increases, and after a certain size has been reached, breaking the
hash into groups (layers if you prefer) improves performance.

What is the basis for this statement? The pure lookup speed certainly
does not degrade unless there is an increasing number of collisions
(which should be prevented by the load factor controlling the table’s
size).

Give us the details of the problem – the nature of the keys, the size of
the data set, and so forth.

Definitively!

Kind regards

robert

unni.tallman · October 19, 2006, 5:26pm

On Wed, 18 Oct 2006, unni.tallman wrote:

Anyway to enhance the performance of ruby hashes? My program uses
hashes heavily. So if i could enhance hash performance, or find a
fasster alternative to hashes, i could improve the overall performance
of my program. Gimme some suggestions.

Try freezing the keys.

http://www.eng.cse.dmu.ac.uk/~hgs/ruby/performance/

Don’t forget all the usual caveats about profiling to find where
the expense is really. [“Programming Pearls”, for one example citation]

    Hugh