Forum: Ruby Getting the smallest Items of an Array

Posted by Ismail M. (ismail_m)
on 2012-11-27 18:04
Hello guys,

I have been working with ranking algorithm, wich collects the smallest
numbers chronologically. but i couldn't make it work.

Here is the code.


module Enumerable
 def min_n(n, &block)
    block = Proc.new { |x,y| x <=> y} if block == nil
    stable = StortedArray.new(&block)
    each do |x|
      stable << x if stable.size < n or block.call(x, stable[-1]) == -1
      stable.pop until stable.size <=n
    end
    return stable

  end

end

m = [1,5,2,8,7,9,10,100,50]
m.min_n(4)


I would be more than glad to get your feedback/suggestions of how to
make it work.
Posted by Jan E. (jacques1)
on 2012-11-27 18:40
Hi,

so what does "not work" mean? I don't know your StortedArray class (or 
rather SortedArray?), but if it works like I think it works, then your 
method should indeed output the four smallest elements:

[1, 2, 5, 7]

But what's all this effort for? Why not simply use Ruby's built-in 
methods for that?

smallest = m.sort.first 4

This might not be particularly efficient, but your method isn't either.
Posted by Ismail M. (ismail_m)
on 2012-11-28 15:06
Thank you very much
Posted by Regis d'Aubarede (raubarede)
on 2012-11-29 12:02
>> def n_min(l,n) (1..n).map {a=l.min ; l=l-[a]; a } end
=> nil
>> n_min([1,2,-1,3,4],2)
=> [-1, 1]

>> l=(1..100000).to_a.sort { |a,b| rand(2)-1 }
>> chrono(100) { l.sort[0..3] }
14 millis

>> l=(1..100000).to_a.sort { |a,b| rand(2)-1 }
>> chrono(100) { n_min(l,4) }
54 millis
Posted by Bartosz Dziewoński (matmarex)
on 2012-11-29 13:38
(Received via mailing list)
2012/11/29 Regis d'Aubarede <lists@ruby-forum.com>:
>>> def n_min(l,n) (1..n).map {a=l.min ; l=l-[a]; a } end
> => nil
>>> n_min([1,2,-1,3,4],2)
> => [-1, 1]

Uhh... how about ary.sort.first(n)?

-- Matma Rex
Posted by "Jesús Gabriel y Galán" <jgabrielygalan@gmail.com> (Guest)
on 2012-11-29 14:09
(Received via mailing list)
On Thu, Nov 29, 2012 at 1:37 PM, Bartosz Dziewoński 
<matma.rex@gmail.com> wrote:
> 2012/11/29 Regis d'Aubarede <lists@ruby-forum.com>:
>>>> def n_min(l,n) (1..n).map {a=l.min ; l=l-[a]; a } end
>> => nil
>>>> n_min([1,2,-1,3,4],2)
>> => [-1, 1]
>
> Uhh... how about ary.sort.first(n)?

You lose the relative ordering of the chosen elements. Don't know if
it's important or not.

Jesus.
Posted by "Иван Бишевац" <ivan.bisevac@gmail.com> (Guest)
on 2012-11-29 14:26
(Received via mailing list)
Try this:
n = [4, 6, 7, 2, 15]

def min(n)
minimum = n[0]
n.each do |element|
 if element < minimum
minimum = element
end
 end
 return minimum
end

puts min(n)
Posted by Jan E. (jacques1)
on 2012-11-29 15:00
"Иван Бишевац" <ivan.bisevac@gmail.com> wrote in post #1087106:
> def min(n)
> minimum = n[0]
> n.each do |element|
>  if element < minimum
> minimum = element
> end
>  end
>  return minimum
> end

Great, you've reimplemented Array#min.

Problem is, this has nothing to do with what the OP wants.
Posted by Regis d'Aubarede (raubarede)
on 2012-11-29 15:32
> Uhh... how about ary.sort.first(n)?


If sort complexity is O(n*log(n), i thought that sorting the complete
array was less smart than my solution which  should be
  O(m*n) / m=number of minus item selected.

but it is not true:
nlog(n) with n=100_000           >> 1_115_000
m*n     with n=100_000 and m=10  >> 1_000_000

perhaps there is a mistake somewhere ?
Posted by Robert Klemme (robert_k78)
on 2012-11-29 15:46
(Received via mailing list)
On Thu, Nov 29, 2012 at 12:02 PM, Regis d'Aubarede 
<lists@ruby-forum.com> wrote:
>>> def n_min(l,n) (1..n).map {a=l.min ; l=l-[a]; a } end
> => nil
>>> n_min([1,2,-1,3,4],2)
> => [-1, 1]

Without having measured it that's quite an inefficient way to do it 
IMHO:
- you are creating a lot temporary arrays (two for each step)
- you are traversing most elements n times

I think, sorting once and then taking the first n elements is more 
efficient.

Turns out, measurements confirm this: this approach is faster only for
n=1 but then one would use Enumerable#min anyway.

require 'benchmark'

REP = 1_000

puts 'Generating data...'

data = [0, 1, 10, 17].map do |i|
    (1 << i).times.map { rand 1000000 }
  end

puts 'done'

def n_min(l,n) (1..n).map {a=l.min ; l=l-[a]; a } end

Benchmark.bm 30 do |x|
  [1, 10, 1000].each do |n|
    data.each do |array|
      x.report "#{n}/#{array.size} items sort" do
        REP.times do
          array.sort[0, n]
        end
      end

      x.report "#{n}/#{array.size} items n_min" do
        REP.times do
          n_min array, n
        end
      end
    end
  end
end

Test run (I interrupted it because it took too long):

Generating data...
done
                                     user     system      total 
real
1/1 items sort                   0.000000   0.000000   0.000000 ( 
0.000500)
1/1 items n_min                  0.000000   0.000000   0.000000 ( 
0.003000)
1/2 items sort                   0.000000   0.000000   0.000000 ( 
0.000500)
1/2 items n_min                  0.000000   0.000000   0.000000 ( 
0.002001)
1/1024 items sort                0.109000   0.000000   0.109000 ( 
0.112014)
1/1024 items n_min               0.093000   0.000000   0.093000 ( 
0.094012)
1/131072 items sort             25.709000   0.141000  25.850000 ( 
25.836281)
1/131072 items n_min            11.685000   0.280000  11.965000 ( 
11.967520)
10/1 items sort                  0.000000   0.000000   0.000000 ( 
0.000500)
10/1 items n_min                 0.015000   0.000000   0.015000 ( 
0.014002)
10/2 items sort                  0.000000   0.000000   0.000000 ( 
0.000000)
10/2 items n_min                 0.016000   0.000000   0.016000 ( 
0.014002)
10/1024 items sort               0.094000   0.016000   0.110000 ( 
0.112514)
10/1024 items n_min              0.904000   0.000000   0.904000 ( 
0.929618)
10/131072 items sort            25.850000   0.109000  25.959000 ( 
25.948795)
10/131072 items n_min          117.609000   2.137000 119.746000 
(119.802713)
1000/1 items sort                0.000000   0.000000   0.000000 ( 
0.000500)
1000/1 items n_min               1.279000   0.000000   1.279000 ( 
1.279162)
1000/2 items sort                0.000000   0.000000   0.000000 ( 
0.001000)
1000/2 items n_min               1.279000   0.000000   1.279000 ( 
1.276662)
1000/1024 items sort             0.109000   0.000000   0.109000 ( 
0.112514)
1000/1024 items n_min           48.579000   0.000000  48.579000 ( 
48.599671)
1000/131072 items sort          25.694000   0.188000  25.882000 ( 
25.885787)
1000/131072 items n_min        x.rb:14:in `block in n_min': Interrupt
        from x.rb:14:in `each'
        from x.rb:14:in `map'
        from x.rb:14:in `n_min'
        from x.rb:27:in `block (5 levels) in <main>'
        from x.rb:26:in `times'
        from x.rb:26:in `block (4 levels) in <main>'
        from /usr/lib/ruby/1.9.1/benchmark.rb:280:in `measure'
        from /usr/lib/ruby/1.9.1/benchmark.rb:362:in `item'
        from x.rb:25:in `block (3 levels) in <main>'
        from x.rb:18:in `each'
        from x.rb:18:in `block (2 levels) in <main>'
        from x.rb:17:in `each'
        from x.rb:17:in `block in <main>'
        from /usr/lib/ruby/1.9.1/benchmark.rb:174:in `benchmark'
        from /usr/lib/ruby/1.9.1/benchmark.rb:205:in `bm'
        from x.rb:16:in `<main>'



Kind regards

robert
Posted by Regis d'Aubarede (raubarede)
on 2012-11-29 16:42
> def n_min(l,n) (1..n).map {a=l.min ; l=l-[a]; a } end
>
>           array.sort[0, n]
>           n_min array, n


You compare ruby implementation for n_min() with c implementation for
sort...

here result with a qsort from http://rosettacode.org, REP=10

class Array
  def qsort
    return self if length <= 1
    pivot = self[0]
    less, greatereq = self[1..-1].partition { |x| x < pivot }
    less.qsort +
      [pivot] +
      greatereq.qsort
  end
end


Generating data...
done
                                    user     system      total
real

4/1024 items sort              0.031000   0.000000   0.031000 (
0.029001)
4/1024 items n_min              0.000000   0.000000   0.000000 (
0.006001)

4/131072 items sort            6.022000   0.016000   6.038000 (
6.039345)
4/131072 items n_min            0.671000   0.016000   0.687000 (
0.701040)

20/1024 items sort             0.031000   0.000000   0.031000 (
0.028002)
20/1024 items n_min             0.031000   0.000000   0.031000 (
0.027001)

20/131072 items sort           6.069000   0.031000   6.100000 (
6.117349)
20/131072 items n_min           3.510000   0.078000   3.588000 (
3.595206)

100/1024 items sort            0.031000   0.000000   0.031000 (
0.030002)
100/1024 items n_min            0.125000   0.000000   0.125000 (
0.135008)

100/131072 items sort          6.286000   0.000000   6.286000 (
6.300360)
100/131072 items n_min         17.223000   0.265000  17.488000 (
17.496001)
Posted by Robert Klemme (robert_k78)
on 2012-11-29 18:37
(Received via mailing list)
On Thu, Nov 29, 2012 at 4:42 PM, Regis d'Aubarede <lists@ruby-forum.com> 
wrote:
>> def n_min(l,n) (1..n).map {a=l.min ; l=l-[a]; a } end
>>
>>           array.sort[0, n]
>>           n_min array, n
>
>
> You compar ruby implementation for n_min() with c implementation for
> sort...

Is that forbidden?  You are making use of C implementation as well.
With that argumentation of yours you would also need to reimplement
methods you use inside n_min in Ruby (notably l.min and l-[a]).  Fact
remains that your algorithm will visit most of the elements of a 2*n
times.  I find that inelegant but YMMV of course.

Kind regards

robert
Posted by Frank Fischer (Guest)
on 2012-11-29 22:15
(Received via mailing list)
On 2012-11-29, Robert Klemme <shortcutter@googlemail.com> wrote:
> Is that forbidden?  You are making use of C implementation as well.
> With that argumentation of yours you would also need to reimplement
> methods you use inside n_min in Ruby (notably l.min and l-[a]).  Fact
> remains that your algorithm will visit most of the elements of a 2*n
> times.  I find that inelegant but YMMV of course.

I agree with Robert, using #sort is probably the simplest approach and
reasonably fast in most situations. This is exactly what I would do.
But I also like playing around with algorithms ;) So if you're really
keen on both, performance *and* doing as much as possible in ruby, you
can try the following implementation of "min_n", which uses a binary
heap. It has a running-time of O(n*log m) (n=array size, m=number of
small elements) and should by quite fast if m << n. Then it also seems
to be quite competetive with the sort approach.

====
def hdown(a, i)
  x, n = a[i], a.size
  while true do
    l = 2*i + 1
    r = l + 1
    break if l >= n
    nxt = r >= n || a[l] >= a[r] ? l : r
    break if x >= a[nxt]
    a[i], i = a[nxt], nxt
  end
  a[i] = x
end

def hbuild(a)
  (a.size/2).downto(0) do |i| hdown(a, i) end
end

def min_n(a, n)
  if a.size <= n then
    return a.dup
  else
    h = a[0,n]
    hbuild(h)
    n.upto(a.size-1) do |i|
      if h[0] > a[i] then
        h[0] = a[i]
        hdown(h, 0)
      end
    end
    return h
  end
end
====

The following tests are from jruby-1.7.0

1/32768 items sort               2.700000   0.020000   2.720000 ( 
2.685000)
1/32768 items min_n              1.400000   0.050000   1.450000 ( 
1.246000)
1/131072 items sort             17.070000   0.010000  17.080000 ( 
17.054000)
1/131072 items min_n             4.510000   0.010000   4.520000 ( 
4.418000)
10/32768 items sort              2.660000   0.000000   2.660000 ( 
2.655000)
10/32768 items min_n             1.170000   0.010000   1.180000 ( 
1.142000)
10/131072 items sort            17.060000   0.000000  17.060000 ( 
17.036000)
10/131072 items min_n            4.540000   0.010000   4.550000 ( 
4.437000)
100/32768 items sort             2.660000   0.000000   2.660000 ( 
2.656000)
100/32768 items min_n            1.440000   0.010000   1.450000 ( 
1.405000)
100/131072 items sort           17.070000   0.000000  17.070000 ( 
17.039000)
100/131072 items min_n           4.900000   0.010000   4.910000 ( 
4.782000)
1000/32768 items sort            2.650000   0.000000   2.650000 ( 
2.653000)
1000/32768 items min_n           3.850000   0.000000   3.850000 ( 
3.794000)
1000/131072 items sort          17.080000   0.010000  17.090000 ( 
17.042000)
1000/131072 items min_n          8.550000   0.010000   8.560000 ( 
8.320000)


and with 1.9.3

1/32768 items sort               3.420000   0.030000   3.450000 ( 
3.437513)
1/32768 items min_n              2.540000   0.000000   2.540000 ( 
2.544122)
1/131072 items sort             15.840000   0.160000  16.000000 ( 
16.007184)
1/131072 items min_n            10.050000   0.000000  10.050000 ( 
10.037374)
10/32768 items sort              3.420000   0.000000   3.420000 ( 
3.422531)
10/32768 items min_n             2.620000   0.000000   2.620000 ( 
2.616946)
10/131072 items sort            15.910000   0.000000  15.910000 ( 
15.913246)
10/131072 items min_n           10.250000   0.000000  10.250000 ( 
10.246652)
100/32768 items sort             3.410000   0.010000   3.420000 ( 
3.413907)
100/32768 items min_n            3.500000   0.000000   3.500000 ( 
3.502954)
100/131072 items sort           15.910000   0.010000  15.920000 ( 
15.923128)
100/131072 items min_n          11.340000   0.010000  11.350000 ( 
11.353193)
1000/32768 items sort            3.420000   0.000000   3.420000 ( 
3.417772)
1000/32768 items min_n          11.060000   0.000000  11.060000 ( 
11.074263)
1000/131072 items sort          15.880000   0.000000  15.880000 ( 
15.882260)
1000/131072 items min_n         22.170000   0.020000  22.190000 ( 
22.197999)



Best regards,
Frank
Posted by Robert Klemme (robert_k78)
on 2012-11-30 12:54
(Received via mailing list)
On Thu, Nov 29, 2012 at 10:14 PM, Frank Fischer
<frank-fischer@shadow-soft.de> wrote:

> I agree with Robert, using #sort is probably the simplest approach and
> reasonably fast in most situations. This is exactly what I would do.
> But I also like playing around with algorithms ;)

I put a few more variants which traverse the original Enumerable just
once up at github to play around with:
https://gist.github.com/4175335

Kind regards

robert
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.