Are Hash speeds documented?

Are the lookup, insertion, deletion, and sort costs of Hash objects
documented anywhere? I would be interested in average-case
and worst-case times… are they linear, logarithmic, exponential, etc.
in time?

Also, is there a way to ‘tune’ hashes so that they use underlying
algorithms which are faster at, say, look-ups over inserts or
vice-versa? That would be killer.

On 25.02.2011 17:57, Nick B. wrote:

Are the lookup, insertion, deletion, and sort costs of Hash objects
documented anywhere? I would be interested in average-case
and worst-case times… are they linear, logarithmic, exponential, etc.
in time?

Are you familiar with hashing in general? If not, please have a look
here:
http://en.wikipedia.org/wiki/Hash_table#Performance_analysis
http://en.wikipedia.org/wiki/Hash_function

If you want true numbers you need to benchmark because all theory may
fail you for particular data or access patterns.

Also, is there a way to ‘tune’ hashes so that they use underlying
algorithms which are faster at, say, look-ups over inserts or
vice-versa? That would be killer.

You can’t modify internal workings of the built in Hash class unless you
start patching the C code. But of course you can create your own
implementation for custom cases.

Cheers

robert

Robert K. wrote in post #983928:

Are you familiar with hashing in general?

Sure (abstractly), but there are different ways it could be implemented.
A hash map to binary trees will have faster lookup times than a hash map
to arrays (“associative array”)… at least it would when you’re using
huge amounts of data and dealing with substantial hash collisions (like
me).

You can’t modify internal workings of the built in Hash class unless you
start patching the C code. But of course you can create your own
implementation for custom cases.

I believe Java at least lets you tune its hashmap by altering some
constants at creation time. I hadn’t heard about anything like this in
Ruby but it sure would be nice.

So it sounds like there is neither data on costs nor tuning capability
available in Ruby. Maybe I’ll write an uber-hash while I’m waiting on
this script to finish running and share it with you all :slight_smile:

On Fri, Feb 25, 2011 at 5:57 PM, Nick B. [email protected] wrote:

Are the lookup, insertion, deletion, and sort costs of Hash objects
documented anywhere? I would be interested in average-case
and worst-case times… are they linear, logarithmic, exponential, etc.
in time?

Lookup, insertion, deletion -> O(1)
Sort -> O(N * logN)


Pozdrawiam

Radosław Bułat
http://radarek.jogger.pl - mój blog

Nick B. wrote:

Are the lookup, insertion, deletion, and sort costs of Hash objects
documented anywhere? I would be interested in average-case
and worst-case times… are they linear, logarithmic, exponential, etc.
in time?

No, as far as I know, performance characteristics are not part of the
Ruby Language Specification. Neither the Draft ISO Ruby Specification
nor RubySpec describe any kind of performance characteristics.

The popular Ruby implementations currently have the following
asymptotic time complexities and asymptotic step complexities:

lookup:    Ο(n) worst-case, Ο(1) amortized worst-case
insertion: Ο(n) worst-case, Ο(1) amortized worst-case
deletion:  Ο(n) worst-case, Ο(1) amortized worst-case
sort:      Hash doesn't have a sort method

If you want to know the average case, you need to supply a probability
distribution. Average case analysis is always relative to a
probability distribution.

But note that these are not part of the language specification. A
Ruby implementation can have much slower implementations and still be
a fully compliant Ruby implementation, similar to what happened with
Jscript, where array lookup was Ο(n) worst-case, Ο(n) amortized
worst-case.

jwm

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs