Searching through a sorted array

FireAphis · October 3, 2007, 9:15am

Hello,

I have a very big array of objects sorted by one of its numeric data
members. During the flow of my application I need occasionally to get
all the elements in a specific range. I could use find_all

partition = my_array.find_all {|x| (x > start) && (x < end)}

My problem is that, as far as I know, find_all just iterates through
all the elements of the array and that’s very inefficient, especially
in my case, in which I have a sorted array. Is there any standard way
to search efficiently through a sorted array? A binary search for
example?

Thanks
FireAphis

FireAphis · October 3, 2007, 9:30am

FireAphis wrote:

Hello,

I have a very big array of objects sorted by one of its numeric data
members. During the flow of my application I need occasionally to get
all the elements in a specific range.

Then why not

my_array[start…end] ?

Cheers,
Peter
__
http://www.rubyrailways.com
http://scrubyt.org

FireAphis · October 3, 2007, 9:50am

On Oct 3, 9:30 am, Peter S. [email protected] wrote:

Cheers,
Peter
__http://www.rubyrailways.comhttp://scrubyt.org

Sorry if my explanation has been misleading. The array isn’t
necessarily comprised of continuous or consecutive numbers. Moreover
they don’t necessarily start from zero. For example:

my_array = [-100, -99, -2, 0, 3, 125, 1742, 1234568763]

now I want to get all the elements bigger than 1 and smaller than
1000. AFAIK my_array[1…1000] doesn’t help in this case.

FireAphis · October 3, 2007, 10:00am

-------- Original-Nachricht --------

Datum: Wed, 3 Oct 2007 16:50:03 +0900
Von: FireAphis [email protected]
An: [email protected]
Betreff: Re: Searching through a sorted array

my_array[start…end] ?

now I want to get all the elements bigger than 1 and smaller than
1000. AFAIK my_array[1…1000] doesn’t help in this case.

In that case, you can say:

interesting_array=my_array.delete_if{|entry| entry>1000 or entry<1}

Best regards,

Axel

FireAphis · October 3, 2007, 9:58am

-------- Original-Nachricht --------

Datum: Wed, 3 Oct 2007 16:30:03 +0900
Von: Peter S. [email protected]
An: [email protected]
Betreff: Re: Searching through a sorted array

FireAphis wrote:

Hello,

I have a very big array of objects sorted by one of its numeric data
members. During the flow of my application I need occasionally to get
all the elements in a specific range.

Then why not

my_array[start…end] ?

Dear FireAphis,

to find the values ‘start’ and ‘end’, you could use something like
the twenty questions game to find a number between 1 and a million
(roughly 2^20, hence 20 questions), starting by :

1.) Is the index of ‘start’ (‘end’) bigger or smaller than 2^19 (roughly
500000)?

2.) If the answer was “yes”, is the index of ‘start’ (‘end’) bigger or
smaller than 2^19+2^18 (roughly 750000)?
If the answer was “no”, is the index of ‘start’ (‘end’) bigger or
smaller than 2^18 (roughly 250000)?

etc…

The range in which “start” lies reduces its size by half with each
question, and you need 2*log(2) n questions to find the ‘start’ and
‘end’
values.

I am actually not aware whether that’s implemented in the Array#index
method.

Best regards,

Axel

FireAphis · October 3, 2007, 10:17am

On Oct 3, 9:56 am, “Axel E.” [email protected] wrote:

members. During the flow of my application I need occasionally to get
(roughly 2^20, hence 20 questions), starting by :

I am actually not aware whether that’s implemented in the Array#index
method.

Best regards,

Axel

–
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN:Handytarif Vergleich 2023 | alle Anbieter vergleichen | GMX

Of course that could be an excellent solution. Do you know if there is
already such a facility in Ruby that implements the algorithm? I’m
quite sure Array#index doesn’t do that since in order to apply a
binary search the array has to be sorted.

FireAphis · October 3, 2007, 11:15am

-------- Original-Nachricht --------

Datum: Wed, 3 Oct 2007 17:15:12 +0900
Von: FireAphis [email protected]
An: [email protected]
Betreff: Re: Searching through a sorted array

I have a very big array of objects sorted by one of its numeric data
to find the values ‘start’ and ‘end’, you could use something like

Dear FireAphis,

I don’t know of an implemented solution, but I’ve implemented my own …
use at your own risk!

Best regards,

Axel

class Array
def find_lower_index(cond)
len=self.length
upper=len
max_nr=(Math.log(len)/Math.log(2)).floor
lower_index=0
pow=1
while eval(cond)==false
lower_index=2pow
pow=pow+1
end
pow=pow-2
lower=2pow
(pow-1).downto(0) do |x|
old_test=lower_index
lower_index=lower+2**x
if eval(cond)==false
else
lower_index=old_test
end
end
return lower_index+1
end

def find_upper_index(cond)
len=self.length
upper=len
max_nr=(Math.log(len)/Math.log(2)).floor
upper_index=0
pow=1
while eval(cond)==true
upper_index=2pow
pow=pow+1
end
pow=pow-2
upper=2pow
(pow-1).downto(0) do |x|
old_test=upper_index
upper_index=upper+2**x
if eval(cond)
else
upper_index=old_test
end
upper=upper_index
end
return upper_index-1
end

end

some example

len=10**8
a=(1…len).to_a
b=5*len/2
cond=‘lower_index>10’
r=a.find_lower_index(cond)
cond=‘upper_index<100’
s=a.find_upper_index(cond)
p r,s
p a[r]
p a[s]

FireAphis · October 3, 2007, 1:45pm

FireAphis wrote:

My problem is that, as far as I know, find_all just iterates through
all the elements of the array and that’s very inefficient, especially
in my case, in which I have a sorted array. Is there any standard way
to search efficiently through a sorted array? A binary search for
example?

You can also turn your array into a hash, which will work just as well
for unsorted arrays:

class Dog
attr_reader :breed, :id

def initialize(breed, id)
@breed = breed
@id = id
end

end

#Create sorted array of Dog’s:
#-----------------------------
dogs = []
(1…2_000).each do |r|
dogs << Dog.new(“Akita”, r*2)
end

=begin
dogs = (1…2000).inject([]) do |arr, r|
arr << Dog.new(“Akita”, r*2)
arr
end
=end
#----------------------------+

#Convert array to hash.
#key: dog.id
#val: Dog object
#---------------------
my_id_dog_hash = {}

class << my_id_dog_hash
attr_accessor :highest_id
end

dogs.each do |dog|
my_id_dog_hash[dog.id] = dog
end

my_id_dog_hash.highest_id = dogs[-1].id
----------------------+

#Find Dog’s in range:
#-------------------
def get_dogs_in_range(id_dog_hash, range)

high_id = id_dog_hash.highest_id
if range.end > high_id
range = range.begin…high_id
end

dogs_in_range = []

range.each do |r|
dog = id_dog_hash[r]

if dog
  dogs_in_range << id_dog_hash[r]
end

end
=begin
dogs_in_range = range.inject([]) do |arr1, r|
if dog = id_dog_hash[r]
arr1 << dog
end

arr1

end
=end

dogs_in_range
end
------------------+

#Test it out:

results = get_dogs_in_range(my_id_dog_hash, 1_990…2000)
results.each {|dog| p dog}
-----------+

–output:–
#<Dog:0x1a70c @id=1990, @breed=“Akita”>
#<Dog:0x1a6e4 @id=1992, @breed=“Akita”>
#<Dog:0x1a6bc @id=1994, @breed=“Akita”>
#<Dog:0x1a694 @id=1996, @breed=“Akita”>
#<Dog:0x1a66c @id=1998, @breed=“Akita”>
#<Dog:0x1a644 @id=2000, @breed=“Akita”>

FireAphis · October 3, 2007, 2:17pm

On Oct 3, 1:45 pm, 7stud – [email protected] wrote:

class Dog
#-----------------------------
=end
end
def get_dogs_in_range(id_dog_hash, range)

results = get_dogs_in_range(my_id_dog_hash, 1_990…2000)

–
Posted viahttp://www.ruby-forum.com/.

Neat. Not space efficient but undoubtfully more quick. But if the IDs
have large gaps ([10000, 20000, 30000]) do you think it still will be
more efficient compared to a simple iteration? Consider the fact that
a simple loop will have three iterations whereas your implementation
will have 30000 iteration.

Thanks,
FireAphis

FireAphis · October 3, 2007, 2:15pm

On Oct 3, 2007, at 3:15 AM, FireAphis wrote:

Of course that could be an excellent solution. Do you know if there is
already such a facility in Ruby that implements the algorithm?

A recent Ruby Q. centered around binary search algorithms, so you
can find several examples reading those solutions:

http://www.rubyquiz.com/quiz139.html

James Edward G. II

FireAphis · October 3, 2007, 2:35pm

On 10/3/07, FireAphis [email protected] wrote:

AFAIK my_array[1…1000] doesn’t help in this case.
Here is my try:

class Array

assumes a sorted array whose elements respond appropriately to <=>

def subrange(lower, higher)
low = find_index_lower(lower)
high = find_index_higher(higher)
if (low >= length) || (high < 0)
[]
else
self[low…high]
end

end

Finds the lowest index whose value is greater than the specified

parameter

If all values in the array are lower, returns the index after the

last one

If all values are higher, returns 0

def find_index_lower(lower)
return length if ((last <=> lower) == -1)
return 0 if ((first <=> lower) == 1)
low = 0
high = self.length - 1
while low < high
index = low + (high - low) / 2
element = self[index]
test = element <=> lower
case test
when 0:
return index + 1
when -1:
return (index + 1) if ((self[index+1] <=> lower) == 1)
low = index + 1
when 1:
return (index) if ((self[index-1] <=> lower) <= 0)
high = index - 1
end
end
end

Finds the highest index whose value is lower than the specified

parameter

If all values in the array are higher, returns -1

If all values are lower, returns length-1

def find_index_higher(higher)
return -1 if ((first <=> higher) >= 0)
return length - 1 if ((last <=> higher) == -1)
low = 0
high = self.length - 1
while low < high
index = low + (high - low) / 2
element = self[index]
test = element <=> higher
case test
when 0:
return index - 1
when -1:
return (index) if ((self[index+1] <=> higher) >= 0)
low = index + 1
when 1:
return (index - 1) if ((self[index-1] <=> higher) == -1)
high = index - 1
end
end
end

end

a = [5, 6, 7, 8, 9, 10, 14]
p a
puts “6 - 13: #{a.subrange(6,13).inspect}”
puts “5 - 15: #{a.subrange(5,15).inspect}”
puts “5 - 100: #{a.subrange(5, 100).inspect}”
puts “2 - 100: #{a.subrange(2, 100).inspect}”
puts “-1 - 100: #{a.subrange(-1, 100).inspect}”
puts “100 - 150: #{a.subrange(100,150).inspect}”
puts “1 - 2: #{a.subrange(1,2).inspect}”

require ‘benchmark’

n = 10_000
Benchmark.bmbm do |x|
x.report(“Array subrange”) {
n.times {a.subrange(6,13)}
}
end

$ ruby array_subrange.rb
[5, 6, 7, 8, 9, 10, 14]
6 - 13: [7, 8, 9, 10]
5 - 15: [6, 7, 8, 9, 10, 14]
5 - 100: [6, 7, 8, 9, 10, 14]
2 - 100: [5, 6, 7, 8, 9, 10, 14]
-1 - 100: [5, 6, 7, 8, 9, 10, 14]
100 - 150: []
1 - 2: []
Rehearsal --------------------------------------------------
Array subrange 0.220000 0.020000 0.240000 ( 0.274937)
----------------------------------------- total: 0.240000sec

                 user     system      total        real

Array subrange 0.190000 0.010000 0.200000 ( 0.227276)

Jesus.

FireAphis · October 3, 2007, 2:50pm

Axel E. wrote:

some example

len=10**8
a=(1…len).to_a
b=5*len/2
cond=‘lower_index>10’
r=a.find_lower_index(cond)
cond=‘upper_index<100’
s=a.find_upper_index(cond)
p r,s
p a[r]
p a[s]

When I use this code:

a = [2, 4, 6, 8]
cond=‘lower_index>0’
r=a.find_lower_index(cond)
cond=‘upper_index<2’
s=a.find_upper_index(cond)
p r,s
p a[r]
p a[s]

I get this output:

3
1
8
4

The output appears to be backwards, since the lower index would be 1 and
the upper index would be 3. And it’s a little bit puzzling why a search
would be needed to find an index in an array that is greater than 0. I
can tell you the answer without searching: it’s 1. The same goes for
the upper index: if the cond = “upper index < 2”, then the upper index
is 1.

FireAphis · October 3, 2007, 2:38pm

Dear Sir

Can I please request you to stop this email services please? as I do
not wish to have emails in my mail box.

regards
suchi

JesÃºs Gabriel y GalÃ¡n [email protected] wrote:
On 10/3/07, FireAphis wrote:

AFAIK my_array[1…1000] doesn’t help in this case.
Here is my try:

class Array

assumes a sorted array whose elements respond appropriately to <=>

def subrange(lower, higher)
low = find_index_lower(lower)
high = find_index_higher(higher)
if (low >= length) || (high < 0)
[]
else
self[low…high]
end

end

Finds the lowest index whose value is greater than the specified

parameter

If all values in the array are lower, returns the index after the last

one

If all values are higher, returns 0

def find_index_lower(lower)
return length if ((last <=> lower) == -1)
return 0 if ((first <=> lower) == 1)
low = 0
high = self.length - 1
while low < high
index = low + (high - low) / 2
element = self[index]
test = element <=> lower
case test
when 0:
return index + 1
when -1:
return (index + 1) if ((self[index+1] <=> lower) == 1)
low = index + 1
when 1:
return (index) if ((self[index-1] <=> lower) <= 0)
high = index - 1
end
end
end

Finds the highest index whose value is lower than the specified

parameter

If all values in the array are higher, returns -1

If all values are lower, returns length-1

def find_index_higher(higher)
return -1 if ((first <=> higher) >= 0)
return length - 1 if ((last <=> higher) == -1)
low = 0
high = self.length - 1
while low < high
index = low + (high - low) / 2
element = self[index]
test = element <=> higher
case test
when 0:
return index - 1
when -1:
return (index) if ((self[index+1] <=> higher) >= 0)
low = index + 1
when 1:
return (index - 1) if ((self[index-1] <=> higher) == -1)
high = index - 1
end
end
end

end

a = [5, 6, 7, 8, 9, 10, 14]
p a
puts “6 - 13: #{a.subrange(6,13).inspect}”
puts “5 - 15: #{a.subrange(5,15).inspect}”
puts “5 - 100: #{a.subrange(5, 100).inspect}”
puts “2 - 100: #{a.subrange(2, 100).inspect}”
puts “-1 - 100: #{a.subrange(-1, 100).inspect}”
puts “100 - 150: #{a.subrange(100,150).inspect}”
puts “1 - 2: #{a.subrange(1,2).inspect}”

require ‘benchmark’

n = 10_000
Benchmark.bmbm do |x|
x.report(“Array subrange”) {
n.times {a.subrange(6,13)}
}
end

$ ruby array_subrange.rb
[5, 6, 7, 8, 9, 10, 14]
6 - 13: [7, 8, 9, 10]
5 - 15: [6, 7, 8, 9, 10, 14]
5 - 100: [6, 7, 8, 9, 10, 14]
2 - 100: [5, 6, 7, 8, 9, 10, 14]
-1 - 100: [5, 6, 7, 8, 9, 10, 14]
100 - 150: []
1 - 2: []
Rehearsal --------------------------------------------------
Array subrange 0.220000 0.020000 0.240000 ( 0.274937)
----------------------------------------- total: 0.240000sec

user system total real
Array subrange 0.190000 0.010000 0.200000 ( 0.227276)

Jesus.

FireAphis · October 3, 2007, 3:24pm

FireAphis wrote:

Neat. Not space efficient

Yes, you’re right. I wouldn’t recommend it for really large arrays. The
original sorted array could also be abandoned in favor of a hash to
begin with.

But if the IDs
have large gaps ([10000, 20000, 30000]) do you think it still will be
more efficient compared to a simple iteration?

Depending on how you write the simple iteration, the hash method can be
a lot slower.

a simple loop will have three iterations whereas your implementation
will have 30000 iteration.

What if a simple iteration has 3 iterations x 3,000 range checks, e.g.:

dogs_in_range = []

dogs.each do |dog|
range.each do |r|
if dog.id == r
dogs_in_range << dog
end
end
end

FireAphis · October 3, 2007, 3:05pm

Axel E. wrote:

Can you explain this output:

a = [2, 4, 6, 8]
cond=‘lower_index>0’
r=a.find_lower_index(cond)
cond=‘upper_index<4’
s=a.find_upper_index(cond)
p r,s
p a[r]
p a[s]

–output:–
3
2
8
6

FireAphis · October 4, 2007, 10:44am

2007/10/3, FireAphis [email protected]:

Of course that could be an excellent solution. Do you know if there is
already such a facility in Ruby that implements the algorithm? I’m
quite sure Array#index doesn’t do that since in order to apply a
binary search the array has to be sorted.

http://raa.ruby-lang.org/search.rhtml?search=binary%20search

Kind regards

robert