Hello. Pleased to announce v 0.7.0 of the google_hash gem has been released, mostly thanks to a patch from rolve. Changes: fixed building in linux with newer GCC's, fixed building in windows with broken system command (?) bump internal google_hash version to 0.8.2 README/teaser: The goal: a better hash for Ruby, either one that is faster or more space efficient than ruby's default. To attempt to accomplish this, this library wraps the google hash sparse and dense hashes [1], which seem to perform much better at least for the #each method. It also creates some "specialized" hashes, for instance, those that take an integer for their key, for even better performance. ruby 1.9.3p194 (2012-04-20 revision 35410) [i686-linux] inserting 400000 objects Ruby Standard Hash populate integer 0.324 #each 0.660 lookup int 0.083 GoogleHashDenseIntToRuby populate integer 0.114 #each 0.050 lookup int 0.080 see https://github.com/rdp/google_hash for more. -roger-
on 2012-10-10 21:01
on 2012-10-10 22:56
On Oct 10, 2012, at 12:00 , Roger Pack <rogerdpack2@gmail.com> wrote: > The goal: a better hash for Ruby, either one that is faster or more > space efficient than ruby's default. > To attempt to accomplish this, this library wraps the google hash > sparse and dense hashes [1], which seem to perform much better > at least for the #each method. It also creates some "specialized" > hashes, for instance, those that take an integer for their key, > for even better performance. From the readme: > These also use significantly less memory, because (if you specify IntToInt, it stores only 4 bytes per int, instead of Ruby's usual 20 bytes). This also frees up Ruby so it doesn't hvae to garbage collect as much. Yea! 20 bytes?? What exactly is this referring to?
on 2012-10-11 17:43
> From the readme: > >> These also use significantly less memory, because (if you specify IntToInt, it stores only 4 bytes per int, instead of Ruby's usual 20 bytes). This also frees up Ruby so it doesn't hvae to garbage collect as much. Yea! > > 20 bytes?? What exactly is this referring to? Normal size (or at least used to be) of a "ruby object" in 32 bit MRI. I believe it's RObject: http://fossies.org/dox/ruby-1.9.3-p194/structRObject.html The size may be bigger in 64 bit MRI, I'm not sure. -r
on 2012-10-11 23:41
Roger Pack <rogerdpack2@gmail.com> wrote: > > From the readme: > > > >> These also use significantly less memory, because (if you specify IntToInt, it stores only 4 bytes per int, instead of Ruby's usual 20 bytes). This also frees up Ruby so it doesn't hvae to garbage collect as much. Yea! > > > > 20 bytes?? What exactly is this referring to? I thought this was referring to st_table_entry size in st.c, but that's 6 words (24 bytes in 32-bit, 48 bytes in 64-bit) on MRI 1.9 with ordered hashes (unpacked). > Normal size (or at least used to be) of a "ruby object" in 32 bit MRI. > I believe it's RObject: > http://fossies.org/dox/ruby-1.9.3-p194/structRObject.html > The size may be bigger in 64 bit MRI, I'm not sure. RObject is 40 bytes on 64-bit MRI. On the plus side with 64-bit, embedded strings can be up to 23 bytes (vs 11 bytes for 32-bit) so there's a better chance of avoiding malloc() overhead with strings.
on 2012-10-12 00:13
On Oct 11, 2012, at 14:40 , Eric Wong <normalperson@yhbt.net> wrote: > >> Normal size (or at least used to be) of a "ruby object" in 32 bit MRI. >> I believe it's RObject: >> http://fossies.org/dox/ruby-1.9.3-p194/structRObject.html >> The size may be bigger in 64 bit MRI, I'm not sure. > > RObject is 40 bytes on 64-bit MRI. On the plus side with 64-bit, > embedded strings can be up to 23 bytes (vs 11 bytes for 32-bit) so > there's a better chance of avoiding malloc() overhead with strings. But we're talking ints here... so they just take up the space of VALUE. I don't know why that's being compared against RObject.
on 2012-10-12 02:38
>> RObject is 40 bytes on 64-bit MRI. On the plus side with 64-bit, >> embedded strings can be up to 23 bytes (vs 11 bytes for 32-bit) so >> there's a better chance of avoiding malloc() overhead with strings. > > But we're talking ints here... so they just take up the space of VALUE. I don't know why that's being compared against RObject. I presumed they were encapsulated within an RObject, so would take up as much space, though I suppose they might not be. Anyway it does decrease the time to do a GC from 0.1s to 0.002 with a hash of 2M integers, so that's worth something :) GoogleHashDenseIntToInt "dense" "took" "0.002" "ruby hash" "took" "3.381" "0.103" But by all means, if it doesn't improve your throughput, don't use it :) -roger-
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.