Hashes are typically implemented with a number of buckets pre-allocated,
usually in the form of an array. Hash values are calculated on the keys
to
be inserted, and a bucket is chosen mod bucket_count. So it’s pretty
much
anyone’s guess where a given hash will put a key. The apparent
reordering is
actually not quite correct; what was seen is apparent ordering. The
fact
that the insertion order was retained was only a lucky coincidence. Try
inserting a, b, and c in a different order.
To make matters worse, a hash will frequently have collisions when
choosing
a bucket, so you generally also have a linked list in each bucket to
hold
values that collide. Then if there are n elements with the same hash mod
bucket_count, you’ll have to iterate over them looking for the correct
element (in Ruby’s case, I believe the one that matches eq?). To avoid
too
many collisions impacting hash performance, it may be necessary to
expand
the number of buckets and rehash everything…at a cost of n, where n is
the
number of elements inserted up to that point. After rehashing, all
apparent
ordering will likely have disappeared.
The general associative structure is called a map, not a hash, because
it
maps keys to values. A hash is one way to implement a map that gives
insertion and lookup performance guarantees; specifically, values can be
inserted, removed, and queries in constant time (as opposed to some
polynomial of n elements). The trade-off is that order is not
guaranteed,
since order ends up being determined by the hash values.
You can implement a map that preserves or maintains order in multiple
ways;
you can use a tree–or specifically a “red/black tree”, which is memory
optimal but requires at most log(n) time a given value, and something
like
log(n) to rebalance the tree periodically. The characteristic used to
maintain order (and balance) in the tree is not important. You can also
maintain a hash and an ordered list internally, using the list for
iteration
and the hash for all other operations. Hash lookups remain constant time
but
modifications take some polynomial of n (because you now have to update
the
ordered list).
It’s all about tradeoffs. Array is a minimum implementation of an
ordered
list…in that it maintains the order you provide for it. It is not a
sorted
list, which maintains an order based on some external characteristic
(like a
sort algorithm or <=>). Hash is a minimal implementation of an unordered
map, which provides fast access to named keys but does not preserve
order or
perform any sorting.
Those calling for an ordered hash are really looking for an ordered map,
and
it seems in many cases they want a sorted map (or an ordered or sorted
“associative array”, as others have mentioned…tomato/tomato). Hash
should
not be modified to preserve order, since it would add overhead to many,
many
cases where order is unnecessary additional overhead. I wouldn’t be
opposed
to adding a general-purpose Tree, to allow those who want to implement
an
ordered map a fast, native data structure to do so; but I would oppose
adding uncommon variants of hash to the standard libraries, specifically
the
OHash (and IHash?!?) that have come up a few times.