Pleased to announce the initial release of a “google_hash” gem.
Its goal. To boldly be faster than any hash hash before (cue star trek
TNG theme).
Or basically a better hash, either one that is faster or more space
efficient than ruby’s default. To attempt this we wrap the google
sparse and dense hashes [1].
Speed results (populating/iterating over 500000 integers):
1.9.1p376 (mingw):
Hash (Ruby default)
0.359375 (populate)
1.1875 (each)
GoogleHashDense
0.1875 (populate)
0.078125 (each)
GoogleHashSparse
0.53125 (populate)
0.078125 (each)
Usage:
a = GoogleHashSparse.new
b = GoogleHashDense.new # or just GoogleHash.new
a[3] = 4
b[4] = ‘abc’
a[‘abc’.hash] = ‘some complex object’
it only accepts int’s currently–only because I’m too lazy to add more
types yet.
a.each{|k, v| }
Installation:
gem install google_hash (if on doze, you’ll need the devkit installed)
Both these classes are currently more space efficient than a hash,
because they store keys as “native” ints, so the keys no longer affect
GC time, as well as only use 4 bytes instead of 20 (or 8 instead of 40,
on 64 bit). This should release some stress on the GC. In terms of
total memory usage, GoogleHashDense uses more (more buckets), and is
more speedy, and GoogleHashSparse uses less space, and is much more
memory efficient (2 bits per entry, or so I’m told).
This is meant to be one more tool in the rubyists toolbelt when trying
to optimize speed-wise, and plans to expand to more types, but at least
with this release it has a #each method.
If you have a desired use case let me know and I might well be able to
code it up for you.
Enjoy.
-r
[1] Google Code Archive - Long-term storage for Google Code Project Hosting.