Using a range as a key

dreamwave · April 3, 2008, 11:05pm

I’m trying to achieve the effect of having a range of integers as a hash
key, possible?

In effect I’ve tried:

range = 12345…12360

hash = { range => “success!” }

hash[12349]

=> nil

I’ve also tried:

array = []
range.each{|x| array << x}

hash = { array => “success!” }

hash[12349]
=> nil

Anyone have any advice to point me in a direction?

Thanks!

Mac

dreamwave · April 3, 2008, 11:25pm

On 03.04.2008 23:06, Michael L. wrote:

I’m trying to achieve the effect of having a range of integers as a hash
key, possible?

In effect I’ve tried:

range = 12345…12360

hash = { range => “success!” }

hash[12349]

This cannot possibly work as 12349 != (12345…12360)

=> nil
Same story.

Anyone have any advice to point me in a direction?

If you always want to access via individual numbers you can do

irb(main):025:0> range = 12345…12360
=> 12345…12360
irb(main):026:0> hash = {}
=> {}
irb(main):027:0> range.each {|i| hash[i] = “success!”}
=> 12345…12360
irb(main):028:0> hash[12349]
=> “success!”

Although I believe there are more space effective solutions (maybe
storing range lower boundaries in an array and doing binary search). It
all depends on the problem you are trying to solve.

Kind regards

robert

dreamwave · April 3, 2008, 11:27pm

is this what you’re trying to do

range = 12345…12360
hash = {range=>“success”}
hash[range]
=> “success”

On Thu, Apr 3, 2008 at 5:06 PM, Michael L.
[email protected]

dreamwave · April 3, 2008, 11:30pm

Michael L. wrote:

I’m trying to achieve the effect of having a range of integers as a hash
key, possible?

In effect I’ve tried:

range = 12345…12360

hash = { range => “success!” }

hash[12349]

=> nil

This won’t work. The range is a specific object, and that whole object
is the key
that points to “success!”. SO in order to get success, you would need
to:

hash[12345…12360]

Hash won’t do this on its own. Here is one way to go about this:

class RangedHash
def initialize(hash)
@ranges = hash
end

def [](key)
  @ranges.each do |range, value|
    return value if range.include?(key)
  end
  nil
end

end

ranges = RangedHash.new(
1…10 => ‘low’,
21…30 => ‘medium’,
41…50 => ‘high’
)
ranges[5] #=> “low”
ranges[15] #=> nil
ranges[25] #=> “medium”

dreamwave · April 4, 2008, 5:41am

Alex W. wrote:

class RangedHash
def initialize(hash)
@ranges = hash
end
def [](key)
  @ranges.each do |range, value|
    return value if range.include?(key)
  end
  nil
end
end

ranges = RangedHash.new(
1…10 => ‘low’,
21…30 => ‘medium’,
41…50 => ‘high’
)
ranges[5] #=> “low”
ranges[15] #=> nil
ranges[25] #=> “medium”

Beautiful. Works great,

However! The file that I’ll be importing will probably contain a large
amount of traffic being pushed at RangedHash.new

If I had a predefined data file ie:

12345…12350 => 89734, 23456…23460 => 23532

ect ect. commas separating the ranges like above…How do I import the
literal of that file instead of the string.

file = File.readlines(‘testdata.txt’) # this creates only 1 array
element

file[0].class
#=> String
file[1]
#=> nil

So this obviously wouldn’t work in a manner of the following:

ranges = RangedHash.new(file)

Same goes for file = File.open(‘testdata.txt’, ‘w+’)
Maybe Marshaldump is the way to go?

Ideas?

Thanks!

Mac

dreamwave · April 4, 2008, 3:04am

On Apr 3, 5:30 pm, Alex W. [email protected] wrote:

hash[12349]
Hash won’t do this on its own. Here is one way to go about this:
nil
ranges[25] #=> “medium”
It’s an interesting idea. What is needed is a Hash that can have key
equality redefined. I seem to recall that 1.9 may have something for
this, so as too allow indifferent key access, and it may work here too
(using === instead of ==, or is it eql?) but I haven’t explored it
yet.

T.

dreamwave · April 4, 2008, 6:16am

From: Michael L. [mailto:[email protected]]

…12345…12350 => 89734, 23456…23460 => 23532

ect ect. commas separating the ranges like above…How do I

import the literal of that file instead of the string.

file = File.readlines(‘testdata.txt’) # this creates only 1 array

element

file[0].class

#=> String

file[1]

#=> nil

So this obviously wouldn’t work in a manner of the following:

ranges = RangedHash.new(file)

Same goes for file = File.open(‘testdata.txt’, ‘w+’)

if it’s text you can eval it (note that there are other ways).

eg,

file=“12345…12350 => 89734, 23456…23460 => 23532\n”
#=> “12345…12350 => 89734, 23456…23460 => 23532\n”
file.strip!
#=> “12345…12350 => 89734, 23456…23460 => 23532”
file
#=> “12345…12350 => 89734, 23456…23460 => 23532”
h=eval(“{”+file+“}”)
#=> {23456…23460=>23532, 12345…12350=>89734}
h.class
#=> Hash

kind regards -botp

dreamwave · April 4, 2008, 10:13am

On Fri, Apr 4, 2008 at 3:04 AM, Trans [email protected] wrote:

It’s an interesting idea. What is needed is a Hash that can have key
equality redefined. I seem to recall that 1.9 may have something for
this, so as too allow indifferent key access, and it may work here too
(using === instead of ==, or is it eql?) but I haven’t explored it
yet.

Interesting idea, and it would work, but it wouldn’t work very well as
you’ll lost most of the advantage of hashing

Say we limit ourselves to ranges of integers for now. Then for hashing
to work, any integer and any range that integer is in should have the
same hash value. This means that all integers a and b, a < b, have the
same hash value because a…b includes both, so both must have the same
hash value as a…b. This obviously also means that all such ranges
must have the same hash value. So although it will still work, the
hashing function will be very bad.

Peter

dreamwave · April 4, 2008, 10:44am

On Fri, Apr 4, 2008 at 3:04 AM, Trans [email protected] wrote:

range = 12345…12360
to:
def
41…50 => ‘high’

T.

–
Posted viahttp://www.ruby-forum.com/.

The basic problem I am seeing is that by accessing a hash with ranges
as keys by elements of these ranges we simply lose our constant lookup
time. I have some difficulties that we can come up with hash functions
that yield the same value for
the range and all its elements and different values for most other
ranges.
Another issue would of course be how to define behavior in case of
overlapping ranges - undefined would probably the best one can come up
without needing to traverse the whole hash for each update.

Well if we sacrifice constant lookup time of course search trees might
indeed solve OP’s problem.

Cheers
Robert

–
http://ruby-smalltalk.blogspot.com/

Whereof one cannot speak, thereof one must be silent.
Ludwig Wittgenstein

dreamwave · April 4, 2008, 12:31pm

On Fri, Apr 4, 2008 at 5:41 AM, Michael L.
[email protected] wrote:
[… about a way to map a range to a specific value …]

However! The file that I’ll be importing will probably contain a large
amount of traffic being pushed at RangedHash.new

I smell premature optimization, under the terms “probably” and “large”.

For a range set that is lower than about 10 million numbers (100
million for a high end machine), it should work quote OK to push each
number in each range into a hash. I would try this before going off
an writing code to support something more complex.

Apart from that, this is solvable in O(log N) per lookup using a
binary tree. If you need something faster, you could do it by
constructing your tree in byte based buckets - essentially, you end up
with one lookup per byte in your number, finding what range it is part
of when the number is well enough resolved that it can only be in one
range.

I have never heard of any way to construct hashes that would be useful
for this, and my gut feeling is that it would have to combine with one
of the above algorithms anyway, and you’d have to do a post-read
perfect hash construction - basically, you’d have to define your
ranges up front and then dynamically create a hash function from it.
My hunch is that this would only be worthwhile if you were dealing
with data sets of many terabytes, and had to process them quickly and
regularly. Of course, it is an interesting research problem…

Eivind.

dreamwave · April 4, 2008, 3:34pm

Eivind E. wrote:

Apart from that, this is solvable in O(log N) per lookup using a
binary tree. If you need something faster, you could do it by
constructing your tree in byte based buckets - essentially, you end up
with one lookup per byte in your number, finding what range it is part
of when the number is well enough resolved that it can only be in one
range.

Eivind.

As a clarification, it’s only LogN if you are using a sorted array with
binary lookup, the RangedHash implementation as above will not perform
LogN… It will be N…

I also believe that this is not a case of premature optimization, If
someone says hash to me, I immediately expect it to behave as a LogN and
I don’t want to concern myself with number of entries within the
container… Hashes are simply hashes, and if you want to make them
perform different than from what every coder knows them to be, then at
least have the courtesy to change the class name…

ilan

dreamwave · April 4, 2008, 4:58pm

2008/4/4, Ilan B. [email protected]:

Eivind E. wrote:

Apart from that, this is solvable in O(log N) per lookup using a
binary tree. If you need something faster, you could do it by
constructing your tree in byte based buckets - essentially, you end up
with one lookup per byte in your number, finding what range it is part
of when the number is well enough resolved that it can only be in one
range.

As a clarification, it’s only LogN if you are using a sorted array with
binary lookup, the RangedHash implementation as above will not perform
LogN… It will be N…

I also believe that this is not a case of premature optimization, If
someone says hash to me, I immediately expect it to behave as a LogN and

Did you want to say “behave as O(1)” - because that’s what is usually
considered to be hash lookup performance?

I don’t want to concern myself with number of entries within the
container… Hashes are simply hashes, and if you want to make them
perform different than from what every coder knows them to be, then at
least have the courtesy to change the class name…

Absolutely.

Kind regards

robert

dreamwave · April 4, 2008, 8:25pm

Eivind E. wrote:

On Fri, Apr 4, 2008 at 5:41 AM, Michael L.
[email protected] wrote:
[… about a way to map a range to a specific value …]

However! The file that I’ll be importing will probably contain a large
amount of traffic being pushed at RangedHash.new

I smell premature optimization, under the terms “probably” and “large”.

and you’d have to do a post-readperfect hash construction - basically, you’d >>have to define your ranges up front and then dynamically create a hash function >>from it.

Eivind.

Precisely what I did. Knowing there was no way to actual construct a
hash needing the format I did within say IRB, I created a basic file
utilizing the hash structure, then imported it as a whole literal.

if you want to make them
perform different than from what every coder knows them to be, then at
least have the courtesy to change the class name…

I completely agree and plan to change that class name now that its been
brought to my attention.

A more ‘correct’ approach to this scenario would be of course to use an
SQL database and deal with the ranges in that matter, but theres always
the challenge of how to do it different than everyone else

Mac

dreamwave · April 4, 2008, 5:17pm

On Apr 4, 4:13 am, Calamitas [email protected] wrote:

Say we limit ourselves to ranges of integers for now. Then for hashing
to work, any integer and any range that integer is in should have the
same hash value. This means that all integers a and b, a < b, have the
same hash value because a…b includes both, so both must have the same
hash value as a…b. This obviously also means that all such ranges
must have the same hash value. So although it will still work, the
hashing function will be very bad.

You are right of course. And actually, I’m not sure it could even
work, now that you mention these details. How could a…b and a…c have
the same hash value in order to match the hash value of x such that a
< x < b and a < x < c. – Am I understanding this right?

However, beyond that, I can imagine some sort of flexible and
optimized Lookup class that uses a re-definable key match proc which
could be quite handy.

T.