Hash order bug?

[email protected] wrote:

Oh my god, i didn’t know it, sorry. Is that a missing feature?

Debate rages on this point :slight_smile: It’s mostly a speed issue, I think.
At least my memory is that Matz’s latest statement was to the effect
that he would do it if it could be done efficiently.

I don’t know if it really rages. :wink:

I personally favor the idea of a so-called ordered hash,
but that’s just me. There’s no way to do it in current
Ruby (while still allowing hash literals). Before arguing
with me, read the previous parenthetical piece and understand
it.

I once talked with Matz about it, suggesting a new class (e.g.,
Map or AssociativeArray or whatever). I had assumed a new name
would be needed because a “hash” is inherently unordered, thus
interfering with the class name “Hash.” But Matz surprised me
by saying that if it were ordered, the name Hash could be kept,
as the underlying data structure was still a hash – just with
a little extra data preserved.

I’ve heard that Nobu made a patch for this, and I heard that
there were some questions about performance (not actual
measurements, just concerns). But I never heard anything beyond
that.

The question came up during a lunch here at OSCON (with me, Jim
Weirich, Pat Eyler, and a fellow named Nicolas whose last name I’m
afraid I don’t remember) as to whether making hashes ordered in a
future Ruby would break any existing code. I think the answer is no,
which is kind of interesting considering what a major change it would
be. It’s a case where the feature would be purely additive.

It couldn’t break code as far as I can see.

I confess to being amused at one person who was so vehemently
against this that he said, “If Ruby added that, I would quit
using Ruby.” Truthfully I laughed out loud, because I thought
that was an extreme reaction.

But of course, he had a point in a way. People sometimes say, “If
you don’t like Feature X, then don’t use it.” But you still may
find yourself reading someone else’s code that does.

I’ll try to make a post soon about a use case for an ordered hash.
I’ve been meaning to collect a bunch of use cases, but haven’t
done so. Right now I can only think of one.

Hal

William J. wrote:

then you set them up like this
[ false, false, true, false, false ]

If you want to get a sequence of values in a certain order:
a.values_at( :und, :det, :sta )

Your point is well taken. David Alan Black also argues eloquently
against the concept of an ordered hash.

But to me there is psychological weight in seeing a literal of this
form:

{a=>b, c=>d, e=>f}

There is in fact a syntactic order (which turns out to be meaningless
internally).

If you want to have your cake and eat it too, you could
use an association list:

as = [[:foo,22], [:bar,33], [:baz,44]]

=> [[:foo, 22], [:bar, 33], [:baz, 44]]

[snip]

That works fine, but I dislike the ugly syntax.

Hal

Hal F. wrote:

a[1]

{a=>b, c=>d, e=>f}

There is in fact a syntactic order (which turns out to be meaningless
internally).

In awk, such hash literals don’t exisit; so when using that language,
one is less tempted to imagine that hashes are ordered. Perhaps
we should think of the above as ‘syntactic sugar’ for

a = Hash.new; a[a]=b; a[c]=d; a[e]=f

If you want to have your cake and eat it too, you could
use an association list:

Lisp also uses these.

as = [[:foo,22], [:bar,33], [:baz,44]]

=> [[:foo, 22], [:bar, 33], [:baz, 44]]

[snip]

That works fine, but I dislike the ugly syntax.

Maybe this is better.

a = [ :foo,22, :bar,33, :baz,44 ].to_assoc

a.assoc_set :bar, 99
a.assoc_set :yes, -1
a.assoc_del :baz

The prerequisite:

class Array
def to_assoc
f=nil ; partition{f=!f}.transpose
end
def assoc_set k, v
(pair = assoc(k)) ? self[ index(pair) ] = [k,v] : push( [k,v] )
end
def assoc_del k
(pair = assoc( k )) && slice!( index( pair ) )
end
end

On Sun, 2006-07-30 at 00:02 +0900, Charles Hoffman wrote:

If you want, and if you really must do away with the overhead of
having
to retrieve the key array and sort it first,

Mind you, you’d just be amortizing that overhead into each hash
insertion/removal, and paying high interest besides – to say nothing of
having to write more code :wink:

–ch–

Charles Hoffman wrote:

whole point of a hash is to maximize speed of access by sacrificing
things like, being able to iterate it in any particular order. Of
course, the word “Dictionary” does, for most of us, bring to mind
something that’s in alphabetical order, so maybe it was Microsoft’s
fault for givig it a dumb name. At any rate, it would be a bad idea to
allow what is essentially a mistake made by programmers to force an
abnormal, slower implementation onto a data structure whose definition
is largely agreed upon in computing. In other words, I’m with the “we
don’t need an ‘ordered hash’” crowd. I think that, besides slowing it
down, it goes against the very definition of what a hash is.

True.

If you need to iterate through a hash in order by key, it’s a simple
matter to grab an array of the keys first, sort it, then iterate from
that. I’m probably going to embarrass myself with my Ruby nuby-ness
here, but off the top of my head:

my_hash.keys.sort.each{|key| doSomethingWith(my_hash[key])}

h = { ‘zebra’, 2, ‘horse’, 0, ‘sheep’, 1 }
==>{“horse”=>0, “sheep”=>1, “zebra”=>2}
h.sort.each{|x| p x}
[“horse”, 0]
[“sheep”, 1]
[“zebra”, 2]
==>[[“horse”, 0], [“sheep”, 1], [“zebra”, 2]]

On Sun, Jul 30, 2006 at 12:02:50AM +0900, Charles Hoffman wrote:

Which isn’t to say anything about Javier specifically, I don’t know him,
but in general, a review of one’s Data Structures course may be in order
when this happens. But I think it happens to just about every
programmer at some point.

Oddly enough, I never made that assumption – but I think that’s because
I first learned about hashes/dictionaries/associative arrays by reading
something that explained up-front the fact that they’re not ordered and,
more to the point, why they aren’t ordered, in any particular manner
that makes immediate sense to the programmer.

course, the word “Dictionary” does, for most of us, bring to mind
something that’s in alphabetical order, so maybe it was Microsoft’s
fault for givig it a dumb name.

If so, you might want to have some words with Guido van Rossum about the
use of the term “dictionary” in Python. I’d probably be happier with
“lexicon” than “dictionary”, but I like “hash” most of all.
“Associative array” is probably best, aside from the fact that it’s WAY
too long to be comfortable for general use.

is largely agreed upon in computing. In other words, I’m with the “we
don’t need an ‘ordered hash’” crowd. I think that, besides slowing it
down, it goes against the very definition of what a hash is.

Agreed . . . but I wouldn’t be opposed to a method that produces
alphabetized and/or asciibetized output, by key, from a hash.

Why doesn’t someone write a red-black tree for Ruby? I mean, it would
be trivial to do with a C++ extension and std::map<>…

I think this is just a common mistake among newbie-ish programmers.
Which isn’t to say anything about Javier specifically, I don’t know him,
but in general, a review of one’s Data Structures course may be in order
when this happens. But I think it happens to just about every
programmer at some point. I remember the first time I tried to do some
ASP scripting for some web pages and I decided to use the Dictionary
object. At one point I wanted to loop through the Dictionary and
display a listing of its contents. I got all pissed off when I realized
that the loop didn’t return the items in alphabetical order. Only later
did I realize that the underlying data structure is a hash, and that the
whole point of a hash is to maximize speed of access by sacrificing
things like, being able to iterate it in any particular order. Of
course, the word “Dictionary” does, for most of us, bring to mind
something that’s in alphabetical order, so maybe it was Microsoft’s
fault for givig it a dumb name. At any rate, it would be a bad idea to
allow what is essentially a mistake made by programmers to force an
abnormal, slower implementation onto a data structure whose definition
is largely agreed upon in computing. In other words, I’m with the “we
don’t need an ‘ordered hash’” crowd. I think that, besides slowing it
down, it goes against the very definition of what a hash is.

If you need to iterate through a hash in order by key, it’s a simple
matter to grab an array of the keys first, sort it, then iterate from
that. I’m probably going to embarrass myself with my Ruby nuby-ness
here, but off the top of my head:

my_hash.keys.sort.each{|key| doSomethingWith(my_hash[key])}

If you want, and if you really must do away with the overhead of having
to retrieve the key array and sort it first, you could modify the Hash
class with in your own program (Ruby lets you do stuff like that) and
have it keep a sorted array of keys, by adding each new key to it in a
sorted position, after every time an item is added to the hash, and
splicing the key out of the array whenever an item is removed. Then you
could override the each method, and the inspect method if necessary.

–ch–

Just Another Victim of the Ambient M. wrote:

Why doesn't someone write a red-black tree for Ruby?  I mean, it would 

be trivial to do with a C++ extension and std::map<>…

Not sure, but I think it’s been done.

Hal

On Sun, 2006-07-30 at 04:28 +0900, Chad P. wrote:

course, the word “Dictionary” does, for most of us, bring to mind
something that’s in alphabetical order, so maybe it was Microsoft’s
fault for givig it a dumb name.

If so, you might want to have some words with Guido van Rossum about the
use of the term “dictionary” in Python. I’d probably be happier with
“lexicon” than “dictionary”, but I like “hash” most of all.
“Associative array” is probably best, aside from the fact that it’s WAY
too long to be comfortable for general use.

I like “Hash” because it reminds you opf the fact that its
implementation involves “hashing” they keys with a mathematical function
of some sort so as to make them quick to get to.

I didn’t know Python calls it Dictionary too. I don’t know any Python
actually.

is largely agreed upon in computing. In other words, I’m with the “we
don’t need an ‘ordered hash’” crowd. I think that, besides slowing it
down, it goes against the very definition of what a hash is.

Agreed . . . but I wouldn’t be opposed to a method that produces
alphabetized and/or asciibetized output, by key, from a hash.

I think someone posted in this thread an example showing that Hash has a
sort method. But now I’m not so sure…

/me heads off to irb to try it…

–ch–

Hal F. wrote:

Just Another Victim of the Ambient M. wrote:

Why doesn't someone write a red-black tree for Ruby?  I mean, it 

would be trivial to do with a C++ extension and std::map<>…

Not sure, but I think it’s been done.

Hal

Yep:

http://www.germane-software.com/software/Utilities/RBTree/index.html

I think there are others, too.

Regards,

Dan

On Mon, 2006-07-31 at 01:02 +0900, Charles Hoffman wrote:

I think someone posted in this thread an example showing that Hash has a
sort method. But now I’m not so sure…

/me heads off to irb to try it…

irb(main):001:0> h = { ‘horse’ => 5, ‘pig’ => 7, ‘car’ => 12, ‘five’ =>
3 }
=> {“five”=>3, “horse”=>5, “pig”=>7, “car”=>12}
irb(main):002:0> h
=> {“five”=>3, “horse”=>5, “pig”=>7, “car”=>12}
irb(main):003:0> h.sort
=> [[“car”, 12], [“five”, 3], [“horse”, 5], [“pig”, 7]]

Sure enough… returns an nX2 array whose rows are key, value, and
sorted by key.

Doesn’t work when your keys are symbols, though:

irb(main):006:0> h = { :horse => 5, :pig => 7, :car => 12, :five => 3 }
=> {:car=>12, :five=>3, :horse=>5, :pig=>7}
irb(main):007:0> h.sort
NoMethodError: undefined method <=>' for :car:Symbol from (irb):7:in<=>’
from (irb):7
from :0
irb(main):008:0>

So long as the types of the keys all suuport the spaceship operator,
however, it looks like hashes will sort.

–ch–

On Jul 30, 2006, at 2:24 PM, Chad P. wrote:

hash.sort_by{ |k,v| k.to_s }

On Mon, 2006-07-31 at 03:58 +0900, Logan C. wrote:

hash.sort_by{ |k,v| k.to_s }

Alternately, you could implement a <=> method for symbols that converts
to strings… something like

class Symbol
def <=>(other)
self.to_s <=> other.to_s
end
end

That might come in handy for other things… maybe not. If it even
works, being just off the top of my head and all.

–ch–

On Mon, Jul 31, 2006 at 01:10:04AM +0900, Charles Hoffman wrote:

irb(main):003:0> h.sort
=> [[“car”, 12], [“five”, 3], [“horse”, 5], [“pig”, 7]]

Sure enough… returns an nX2 array whose rows are key, value, and
sorted by key.

Doesn’t work when your keys are symbols, though:

That would explain why I thought .sort didn’t work with hashes.

On Mon, 2006-07-31 at 05:47 +0900, Charles Hoffman wrote:

That might come in handy for other things… maybe not. If it even
works, being just off the top of my head and all.

Hey, it does. Man, I’m really starting to like this Ruby stuff. Now I
just have to come up with a good project so I have an excuse to code in
it for real :wink:

–ch–

Hashes are unordered. If you need an ordered collection, you’ll need
to use an array.

Hashes are unordered. that is how sadly for me,really!

Hi –

On Sat, 5 May 2007, Nicholas Clare wrote:

Posted via http://www.ruby-forum.com/.
Again, there might very well be a better way to do even that, I’m new to
ruby.

sorted_keys = hash.keys.sort

:slight_smile:

David

On 5/5/07, Haoqi H. [email protected] wrote:

Hi, I missed some of this thread, I only just joined the list. Although
it’s
less than optimal, can you not just sort the keys when you need them.
Something like:

require ‘enumerator’
sorted_keys = hash.enum_for(:each_key).to_a.sort

Again, there might very well be a better way to do even that, I’m new to
ruby.

Nick

On 5/5/07, [email protected] [email protected] wrote:

less than optimal, can you not just sort the keys when you need them.

A. Ruby Power and Light, LLC (http://www.rubypal.com)

Hi

Well, that certainly is easier. Coming from other programming languages
(which I won’t name), I expect the answer to be more tricky than it is.
I’m
learning something new every day

Thanks,
Nick