Hello.
Pleased to announce v 0.7.0 of the google_hash gem has been released,
mostly thanks to a patch from rolve.
Changes:
fixed building in linux with newer GCC’s, fixed building in windows
with broken system command (?)
bump internal google_hash version to 0.8.2
README/teaser:
The goal: a better hash for Ruby, either one that is faster or more
space efficient than ruby’s default.
To attempt to accomplish this, this library wraps the google hash
sparse and dense hashes [1], which seem to perform much better
at least for the #each method. It also creates some “specialized”
hashes, for instance, those that take an integer for their key,
for even better performance.
The goal: a better hash for Ruby, either one that is faster or more
space efficient than ruby’s default.
To attempt to accomplish this, this library wraps the google hash
sparse and dense hashes [1], which seem to perform much better
at least for the #each method. It also creates some “specialized”
hashes, for instance, those that take an integer for their key,
for even better performance.
From the readme:
These also use significantly less memory, because (if you specify IntToInt, it
stores only 4 bytes per int, instead of Ruby’s usual 20 bytes). This also frees
up Ruby so it doesn’t hvae to garbage collect as much. Yea!
These also use significantly less memory, because (if you specify IntToInt, it
stores only 4 bytes per int, instead of Ruby’s usual 20 bytes). This also frees
up Ruby so it doesn’t hvae to garbage collect as much. Yea!
These also use significantly less memory, because (if you specify IntToInt,
it stores only 4 bytes per int, instead of Ruby’s usual 20 bytes). This also
frees up Ruby so it doesn’t hvae to garbage collect as much. Yea!
20 bytes?? What exactly is this referring to?
I thought this was referring to st_table_entry size in st.c, but that’s
6 words (24 bytes in 32-bit, 48 bytes in 64-bit) on MRI 1.9 with ordered
hashes (unpacked).
RObject is 40 bytes on 64-bit MRI. On the plus side with 64-bit,
embedded strings can be up to 23 bytes (vs 11 bytes for 32-bit) so
there’s a better chance of avoiding malloc() overhead with strings.
RObject is 40 bytes on 64-bit MRI. On the plus side with 64-bit,
embedded strings can be up to 23 bytes (vs 11 bytes for 32-bit) so
there’s a better chance of avoiding malloc() overhead with strings.
But we’re talking ints here… so they just take up the space of VALUE.
I don’t know why that’s being compared against RObject.
RObject is 40 bytes on 64-bit MRI. On the plus side with 64-bit,
embedded strings can be up to 23 bytes (vs 11 bytes for 32-bit) so
there’s a better chance of avoiding malloc() overhead with strings.
But we’re talking ints here… so they just take up the space of VALUE. I don’t
know why that’s being compared against RObject.
I presumed they were encapsulated within an RObject, so would take up
as much space, though I suppose they might not be.
Anyway it does decrease the time to do a GC from 0.1s to 0.002 with a
hash of 2M integers, so that’s worth something
GoogleHashDenseIntToInt
“dense”
“took”
“0.002”
“ruby hash”
“took” “3.381”
“0.103”
But by all means, if it doesn’t improve your throughput, don’t use it
-roger-
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.