Getting the smallest Items of an Array

Hello guys,

I have been working with ranking algorithm, wich collects the smallest
numbers chronologically. but i couldn’t make it work.

Here is the code.

module Enumerable
def min_n(n, &block)
block = Proc.new { |x,y| x <=> y} if block == nil
stable = StortedArray.new(&block)
each do |x|
stable << x if stable.size < n or block.call(x, stable[-1]) == -1
stable.pop until stable.size <=n
end
return stable

end

end

m = [1,5,2,8,7,9,10,100,50]
m.min_n(4)

I would be more than glad to get your feedback/suggestions of how to
make it work.

Hi,

so what does “not work” mean? I don’t know your StortedArray class (or
rather SortedArray?), but if it works like I think it works, then your
method should indeed output the four smallest elements:

[1, 2, 5, 7]

But what’s all this effort for? Why not simply use Ruby’s built-in
methods for that?

smallest = m.sort.first 4

This might not be particularly efficient, but your method isn’t either.

Thank you very much

def n_min(l,n) (1…n).map {a=l.min ; l=l-[a]; a } end
=> nil

n_min([1,2,-1,3,4],2)
=> [-1, 1]

l=(1…100000).to_a.sort { |a,b| rand(2)-1 }
chrono(100) { l.sort[0…3] }
14 millis

l=(1…100000).to_a.sort { |a,b| rand(2)-1 }
chrono(100) { n_min(l,4) }
54 millis

2012/11/29 Regis d’Aubarede [email protected]:

def n_min(l,n) (1…n).map {a=l.min ; l=l-[a]; a } end
=> nil
n_min([1,2,-1,3,4],2)
=> [-1, 1]

Uhh… how about ary.sort.first(n)?

– Matma R.

On Thu, Nov 29, 2012 at 1:37 PM, Bartosz Dziewoński
[email protected] wrote:

2012/11/29 Regis d’Aubarede [email protected]:

def n_min(l,n) (1…n).map {a=l.min ; l=l-[a]; a } end
=> nil
n_min([1,2,-1,3,4],2)
=> [-1, 1]

Uhh… how about ary.sort.first(n)?

You lose the relative ordering of the chosen elements. Don’t know if
it’s important or not.

Jesus.

Try this:
n = [4, 6, 7, 2, 15]

def min(n)
minimum = n[0]
n.each do |element|
if element < minimum
minimum = element
end
end
return minimum
end

puts min(n)

Uhh… how about ary.sort.first(n)?

If sort complexity is O(nlog(n), i thought that sorting the complete
array was less smart than my solution which should be
O(m
n) / m=number of minus item selected.

but it is not true:
nlog(n) with n=100_000 >> 1_115_000
m*n with n=100_000 and m=10 >> 1_000_000

perhaps there is a mistake somewhere ?

“Иван Бишевац” [email protected] wrote in post #1087106:

def min(n)
minimum = n[0]
n.each do |element|
if element < minimum
minimum = element
end
end
return minimum
end

Great, you’ve reimplemented Array#min.

Problem is, this has nothing to do with what the OP wants.

On Thu, Nov 29, 2012 at 12:02 PM, Regis d’Aubarede
[email protected] wrote:

def n_min(l,n) (1…n).map {a=l.min ; l=l-[a]; a } end
=> nil
n_min([1,2,-1,3,4],2)
=> [-1, 1]

Without having measured it that’s quite an inefficient way to do it
IMHO:

  • you are creating a lot temporary arrays (two for each step)
  • you are traversing most elements n times

I think, sorting once and then taking the first n elements is more
efficient.

Turns out, measurements confirm this: this approach is faster only for
n=1 but then one would use Enumerable#min anyway.

require ‘benchmark’

REP = 1_000

puts ‘Generating data…’

data = [0, 1, 10, 17].map do |i|
(1 << i).times.map { rand 1000000 }
end

puts ‘done’

def n_min(l,n) (1…n).map {a=l.min ; l=l-[a]; a } end

Benchmark.bm 30 do |x|
[1, 10, 1000].each do |n|
data.each do |array|
x.report “#{n}/#{array.size} items sort” do
REP.times do
array.sort[0, n]
end
end

  x.report "#{n}/#{array.size} items n_min" do
    REP.times do
      n_min array, n
    end
  end
end

end
end

Test run (I interrupted it because it took too long):

Generating data…
done
user system total
real
1/1 items sort 0.000000 0.000000 0.000000 (
0.000500)
1/1 items n_min 0.000000 0.000000 0.000000 (
0.003000)
1/2 items sort 0.000000 0.000000 0.000000 (
0.000500)
1/2 items n_min 0.000000 0.000000 0.000000 (
0.002001)
1/1024 items sort 0.109000 0.000000 0.109000 (
0.112014)
1/1024 items n_min 0.093000 0.000000 0.093000 (
0.094012)
1/131072 items sort 25.709000 0.141000 25.850000 (
25.836281)
1/131072 items n_min 11.685000 0.280000 11.965000 (
11.967520)
10/1 items sort 0.000000 0.000000 0.000000 (
0.000500)
10/1 items n_min 0.015000 0.000000 0.015000 (
0.014002)
10/2 items sort 0.000000 0.000000 0.000000 (
0.000000)
10/2 items n_min 0.016000 0.000000 0.016000 (
0.014002)
10/1024 items sort 0.094000 0.016000 0.110000 (
0.112514)
10/1024 items n_min 0.904000 0.000000 0.904000 (
0.929618)
10/131072 items sort 25.850000 0.109000 25.959000 (
25.948795)
10/131072 items n_min 117.609000 2.137000 119.746000
(119.802713)
1000/1 items sort 0.000000 0.000000 0.000000 (
0.000500)
1000/1 items n_min 1.279000 0.000000 1.279000 (
1.279162)
1000/2 items sort 0.000000 0.000000 0.000000 (
0.001000)
1000/2 items n_min 1.279000 0.000000 1.279000 (
1.276662)
1000/1024 items sort 0.109000 0.000000 0.109000 (
0.112514)
1000/1024 items n_min 48.579000 0.000000 48.579000 (
48.599671)
1000/131072 items sort 25.694000 0.188000 25.882000 (
25.885787)
1000/131072 items n_min x.rb:14:in block in n_min': Interrupt from x.rb:14:in each’
from x.rb:14:in map' from x.rb:14:in n_min’
from x.rb:27:in block (5 levels) in <main>' from x.rb:26:in times’
from x.rb:26:in block (4 levels) in <main>' from /usr/lib/ruby/1.9.1/benchmark.rb:280:in measure’
from /usr/lib/ruby/1.9.1/benchmark.rb:362:in item' from x.rb:25:in block (3 levels) in ’
from x.rb:18:in each' from x.rb:18:in block (2 levels) in ’
from x.rb:17:in each' from x.rb:17:in block in ’
from /usr/lib/ruby/1.9.1/benchmark.rb:174:in benchmark' from /usr/lib/ruby/1.9.1/benchmark.rb:205:in bm’
from x.rb:16:in `’

Kind regards

robert

On Thu, Nov 29, 2012 at 4:42 PM, Regis d’Aubarede [email protected]
wrote:

def n_min(l,n) (1…n).map {a=l.min ; l=l-[a]; a } end

      array.sort[0, n]
      n_min array, n

You compar ruby implementation for n_min() with c implementation for
sort…

Is that forbidden? You are making use of C implementation as well.
With that argumentation of yours you would also need to reimplement
methods you use inside n_min in Ruby (notably l.min and l-[a]). Fact
remains that your algorithm will visit most of the elements of a 2*n
times. I find that inelegant but YMMV of course.

Kind regards

robert

def n_min(l,n) (1…n).map {a=l.min ; l=l-[a]; a } end

      array.sort[0, n]
      n_min array, n

You compare ruby implementation for n_min() with c implementation for
sort…

here result with a qsort from http://rosettacode.org, REP=10

class Array
def qsort
return self if length <= 1
pivot = self[0]
less, greatereq = self[1…-1].partition { |x| x < pivot }
less.qsort +
[pivot] +
greatereq.qsort
end
end

Generating data…
done
user system total
real

4/1024 items sort 0.031000 0.000000 0.031000 (
0.029001)
4/1024 items n_min 0.000000 0.000000 0.000000 (
0.006001)

4/131072 items sort 6.022000 0.016000 6.038000 (
6.039345)
4/131072 items n_min 0.671000 0.016000 0.687000 (
0.701040)

20/1024 items sort 0.031000 0.000000 0.031000 (
0.028002)
20/1024 items n_min 0.031000 0.000000 0.031000 (
0.027001)

20/131072 items sort 6.069000 0.031000 6.100000 (
6.117349)
20/131072 items n_min 3.510000 0.078000 3.588000 (
3.595206)

100/1024 items sort 0.031000 0.000000 0.031000 (
0.030002)
100/1024 items n_min 0.125000 0.000000 0.125000 (
0.135008)

100/131072 items sort 6.286000 0.000000 6.286000 (
6.300360)
100/131072 items n_min 17.223000 0.265000 17.488000 (
17.496001)

On 2012-11-29, Robert K. [email protected] wrote:

Is that forbidden? You are making use of C implementation as well.
With that argumentation of yours you would also need to reimplement
methods you use inside n_min in Ruby (notably l.min and l-[a]). Fact
remains that your algorithm will visit most of the elements of a 2*n
times. I find that inelegant but YMMV of course.

I agree with Robert, using #sort is probably the simplest approach and
reasonably fast in most situations. This is exactly what I would do.
But I also like playing around with algorithms :wink: So if you’re really
keen on both, performance and doing as much as possible in ruby, you
can try the following implementation of “min_n”, which uses a binary
heap. It has a running-time of O(n*log m) (n=array size, m=number of
small elements) and should by quite fast if m << n. Then it also seems
to be quite competetive with the sort approach.

====
def hdown(a, i)
x, n = a[i], a.size
while true do
l = 2*i + 1
r = l + 1
break if l >= n
nxt = r >= n || a[l] >= a[r] ? l : r
break if x >= a[nxt]
a[i], i = a[nxt], nxt
end
a[i] = x
end

def hbuild(a)
(a.size/2).downto(0) do |i| hdown(a, i) end
end

def min_n(a, n)
if a.size <= n then
return a.dup
else
h = a[0,n]
hbuild(h)
n.upto(a.size-1) do |i|
if h[0] > a[i] then
h[0] = a[i]
hdown(h, 0)
end
end
return h
end
end

The following tests are from jruby-1.7.0

1/32768 items sort 2.700000 0.020000 2.720000 (
2.685000)
1/32768 items min_n 1.400000 0.050000 1.450000 (
1.246000)
1/131072 items sort 17.070000 0.010000 17.080000 (
17.054000)
1/131072 items min_n 4.510000 0.010000 4.520000 (
4.418000)
10/32768 items sort 2.660000 0.000000 2.660000 (
2.655000)
10/32768 items min_n 1.170000 0.010000 1.180000 (
1.142000)
10/131072 items sort 17.060000 0.000000 17.060000 (
17.036000)
10/131072 items min_n 4.540000 0.010000 4.550000 (
4.437000)
100/32768 items sort 2.660000 0.000000 2.660000 (
2.656000)
100/32768 items min_n 1.440000 0.010000 1.450000 (
1.405000)
100/131072 items sort 17.070000 0.000000 17.070000 (
17.039000)
100/131072 items min_n 4.900000 0.010000 4.910000 (
4.782000)
1000/32768 items sort 2.650000 0.000000 2.650000 (
2.653000)
1000/32768 items min_n 3.850000 0.000000 3.850000 (
3.794000)
1000/131072 items sort 17.080000 0.010000 17.090000 (
17.042000)
1000/131072 items min_n 8.550000 0.010000 8.560000 (
8.320000)

and with 1.9.3

1/32768 items sort 3.420000 0.030000 3.450000 (
3.437513)
1/32768 items min_n 2.540000 0.000000 2.540000 (
2.544122)
1/131072 items sort 15.840000 0.160000 16.000000 (
16.007184)
1/131072 items min_n 10.050000 0.000000 10.050000 (
10.037374)
10/32768 items sort 3.420000 0.000000 3.420000 (
3.422531)
10/32768 items min_n 2.620000 0.000000 2.620000 (
2.616946)
10/131072 items sort 15.910000 0.000000 15.910000 (
15.913246)
10/131072 items min_n 10.250000 0.000000 10.250000 (
10.246652)
100/32768 items sort 3.410000 0.010000 3.420000 (
3.413907)
100/32768 items min_n 3.500000 0.000000 3.500000 (
3.502954)
100/131072 items sort 15.910000 0.010000 15.920000 (
15.923128)
100/131072 items min_n 11.340000 0.010000 11.350000 (
11.353193)
1000/32768 items sort 3.420000 0.000000 3.420000 (
3.417772)
1000/32768 items min_n 11.060000 0.000000 11.060000 (
11.074263)
1000/131072 items sort 15.880000 0.000000 15.880000 (
15.882260)
1000/131072 items min_n 22.170000 0.020000 22.190000 (
22.197999)

Best regards,
Frank

On Thu, Nov 29, 2012 at 10:14 PM, Frank F.
[email protected] wrote:

I agree with Robert, using #sort is probably the simplest approach and
reasonably fast in most situations. This is exactly what I would do.
But I also like playing around with algorithms :wink:

I put a few more variants which traverse the original Enumerable just
once up at github to play around with:

Kind regards

robert