Calculations on lists of numbers

for years i’ve felt that i should be able to pipe numerical output into
some
unix command like so

cat list | mean
cat list | sum
cat list | minmax

etc.

and have never found one. right now i’m building a ruby version -
before i
continue, does anyone know a standard unix or ruby version of this?

cheers.

-a

I wonder if something like bc could do something along those lines.
The biggest problem is that you have to assume a lot of information
about the format of the list.

When you get something implemented, I’d be interested in seeing it.

-Chris

cat list | mean

cat list | awk ‘{ s += $1; n += 1 } END { print s / n }’

cat list | sum

cat list | awk ‘{ s += $1 } END { print s }’

cat list | minmax

Hmmm… need to study some more awk.

On Sat, 2 Dec 2006, Matthew M. wrote:

Hmmm… need to study some more awk.
yup. that’s where i broke into ruby too :wink:

-a

Oh… my samples assume each number is on its own line… dunno
offhand how to support an arbitrary amount of numbers on a line…

On Sat, 2 Dec 2006, Matthew M. wrote:

Oh… my samples assume each number is on its own line… dunno
offhand how to support an arbitrary amount of numbers on a line…

easy in ruby :wink:

-a

[email protected] wrote:

and have never found one. right now i’m building a ruby version - before
i continue, does anyone know a standard unix or ruby version of this?

It is so easy to create in Ruby, a matter of minutes, that it is not
terribly important to do the search you are suggesting.


#!/usr/bin/ruby -w

array = []

STDIN.read.split(/\s+/).each do |item|
if(v = item.to_f)
array << v
end
end

if(array.size > 0)
sum = 0
array.each { |v| sum += v }
mean = sum / array.size
puts sum.to_s + " " +
mean.to_s + " " +
array.min.to_s + " " +
array.max.to_s
end


$ echo 1 2 3 4 5 | (script_name)

Output: 15.0 3.0 1.0 5.0

Joel VanderWerf wrote:

etc.

and have never found one. right now i’m building a ruby version -
before i continue, does anyone know a standard unix or ruby version of
this?

It is so easy to create in Ruby, a matter of minutes, that it is not
terribly important to do the search you are suggesting.

Disagree.

It’s a bit too late to disagree, in the face of the evidence that I said
it,
then I did it.

I would like to know if a unix version exists, since it will
certainly be faster than ruby, run in less memory, and probably exist in
environments where ruby doesn’t. So I think the search is worth while.

Yes, all true, but that isn’t what you disagreed with.

A number of *nix hands will probably put forth solutions that rely on
awk or
bc (something I might have done in years past), and they will probably
be
faster, and they certainly exist, and no need to write any Ruby code.

But writing something quick and serviceable in Ruby was extremely easy.

Joel VanderWerf wrote:

puts “#{sum} #{mean} #{array.min} #{array.max}”

I’m not much into golf, but, since we’ve long since left the clubhouse,
and
because I am very lazy:

puts [ sum,mean,array.min,array.max ].map { |v| v.to_s }.join(’ ')

This has the sole advantage that particular elements can be added and
removed without a lot of typing. If no one expects to change the
program,
then there’s no point in it.

Paul L. wrote:

and have never found one. right now i’m building a ruby version - before
i continue, does anyone know a standard unix or ruby version of this?

It is so easy to create in Ruby, a matter of minutes, that it is not
terribly important to do the search you are suggesting.

Disagree. I would like to know if a unix version exists, since it will
certainly be faster than ruby, run in less memory, and probably exist in
environments where ruby doesn’t. So I think the search is worth while.

Also, it is terribly important to scrutinize code one finds in a
newsgroup, so here we go…

#!/usr/bin/ruby -w

array = []

STDIN.read.split(/\s+/).each do |item|
if(v = item.to_f)
array << v
end
end

  • Use $stdin instead of STDIN, to play well with reassignment of $stdin,
    in case this snippet ever becomes part of a library, and someone wants
    to capture output.

  • The code above can be simplified and improved so that files can be
    named on the command line:

array = ARGF.read.split.map {|s| Float(s)}

(Is #Float better than #to_f? It depends. If you want “3foobar” to be
treated as 3.0 and you want “foobar3” to be treated as 0.0, stick with
#to_f (and keep the nil values out of the array). If you want the
program to die noisily on bad input, use #Float. As a bonus, you don’t
have to deal with nil values.)

if(array.size > 0)
sum = 0
array.each { |v| sum += v }
mean = sum / array.size
puts sum.to_s + " " +
mean.to_s + " " +
array.min.to_s + " " +
array.max.to_s
end

More idiomatically ruby, IMO, is the following:

unless array.empty?
sum = array.inject {|s,x| s+x}
mean = sum / array.size
puts “#{sum} #{mean} #{array.min} #{array.max}”
end

Also, you might want an empty array have a sum of 0, just so that the
nice algebraic properties hold:

[1, 2, 3, 4].sum + [].sum == [1, 2].sum + [3, 4].sum

(And, it’s fairly standard:
http://wiki.r-project.org/rwiki/doku.php?id=tips:surprises:emptysetfuncs)

That only makes sense for the

cat list | sum

invocation, of course.

Here’s the implementation so far:

$ cat agr.rb
#!/usr/bin/env ruby

array = ARGF.read.split.map {|s| Float(s)}

sum = array.inject(0) {|s,x| s+x}

print sum
unless array.empty?
mean = sum / array.size
print " #{mean} #{array.min} #{array.max}"
end
puts

$ echo “1 2 3” | ./agr.rb
6.0 2.0 1.0 3.0
$ echo “1 2 3foo” | ./agr.rb
./agr.rb:3:in `Float’: invalid value for Float(): “3foo” (ArgumentError)
from ./agr.rb:3
from ./agr.rb:3
$ echo “1 2 3” >data
$ ./agr.rb data data
12.0 2.0 1.0 3.0
[~/tmp] echo “” >empty_data
[~/tmp] ./agr.rb empty_data
0


$ echo 1 2 3 4 5 | (script_name)

Output: 15.0 3.0 1.0 5.0

And then what do you do if you are piping this output somewhere else?
Use cut to get the mean or whatever it was you wanted? The OP wanted
three separate functions. It might be better to use an argument to the
script to select which aggregate value is to be output.

There’s not much point computing the min and max if only the mean was
requested.

[email protected] wrote:

for years i’ve felt that i should be able to pipe numerical output into some
unix command like so

cat list | mean
cat list | sum
cat list | minmax

Why drag in the cat when it’s utterly superfluous?

mean <list
sum <list
minmax <list

etc.

and have never found one. right now i’m building a ruby version - before i
continue, does anyone know a standard unix or ruby version of this?

Matthew M. wrote:

cat list | mean

cat list | awk ‘{ s += $1; n += 1 } END { print s / n }’

Reading a file isn’t a magical ability; awk can do it.

awk “{s+=$0} END{print s/NR}” list

If there can be more than one number on a line:

awk “{for(i=1;i<=NF;i++)s+=$i; n+=NF} END{print s/n}” file

Ruby:

ruby -nale “BEGIN{$s=$n=0}; $s+=$F.inject(0){|x,y| x.to_f+y.to_f};
$n+=$F.size; END{puts $s/$n}” file

If there’s memory enough for the whole file:

ruby -e “a=$<.read.split.map{|x|x.to_f};
puts a.inject{|x,y|x+y}/a.size” file

Le samedi 02 décembre 2006 05:02, [email protected] a écrit :

continue, does anyone know a standard unix or ruby version of this?

cheers.

-a

Hi,

I'm currently coding a ruby program for processing images. One of the 

class I
wrote is intended to compute some stats about the luminance of a
channel, but
in fact it can be used on any set of numerical datas. The stats are :

  • an histogram

  • the mean

  • the variance

  • the deviation

  • the median

  • the skewness

  • the kurtosis

    It is very fast, since it uses no memory : the values are not stored
    internally, just the sub-results (so, a list of 2 values will use the
    same
    amount of memory than a list of a billion values), and also because the
    method that adds a value is generated depending of what stats you want
    to
    compute.

    It would be really easy to add min and max, but it would need one or
    two
    modifications to get rid of the “image-specific” things. Then, reading
    stdin
    and add the values would be no problem.

    If you want to base your work on my class then just tell me, I’ll be
    happy to
    share it if it can help.

– Olivier

On 12/2/06, Paul L. [email protected] wrote:

Joel VanderWerf wrote:

puts “#{sum} #{mean} #{array.min} #{array.max}”

I’m not much into golf, but, since we’ve long since left the clubhouse, and
because I am very lazy:

puts [ sum,mean,array.min,array.max ].map { |v| v.to_s }.join(’ ')

puts [ sum, mean, array.min, array.max ].join(’ ')

just cause join calls .to_s on every element anyway :wink:

On Sat, 2 Dec 2006, Olivier wrote:

  • the skewness
    and add the values would be no problem.

    If you want to base your work on my class then just tell me, I’ll be happy to
    share it if it can help.

– Olivier

heh, i’ve got something similar i use to compute stats on binary data
all the
time, here’s the entire code

harp:~> cat a.rb
#! /dmsp/reference/bin/ruby

require ‘narray’
require ‘yaml’

list = ARGF.readlines.map{|line| line.strip.split(%r/\s+/).map{|f|
Float f}}.flatten
na = NArray.to_na list

puts ‘—’
%w( min max mean median stddev ).each{|stat| puts “#{ stat }: #{
na.send stat}”}

but it’s not as complete as yours.

-a

[email protected] wrote:

harp:~> cat a.rb
Man, how do you keep all your a.rb’s straight?

#! /dmsp/reference/bin/ruby

require ‘narray’
require ‘yaml’
Am I blind, or do you require ‘yaml’ and never use it?

list = ARGF.readlines.map{|line| line.strip.split(%r/\s+/).map{|f|
Float f}}.flatten
Doesn’t this load the whole list of numbers into memory? (i.e. how does
it fare on “a billion values”?

na = NArray.to_na list

puts ‘—’
%w( min max mean median stddev ).each{|stat| puts “#{ stat }: #{
na.send stat}”}

Devin

William J. wrote:

mean <list
sum <list
minmax <list

Yes, true, but in a simple example like this, ‘cat’ is just a stand-in
for
some other application that would stream the numbers. In such a case,
the
pipe seems more appropriate.

On Sat, 2 Dec 2006, Paul L. wrote:

cat list | minmax
Disagree.

It’s a bit too late to disagree, in the face of the evidence that I said it,
then I did it.

i agree that it’s easy to emulate awk, but shouldn’t we do something
better in
ruby? i’m personally always inspired by ruby’s elegance to write
something
better and more exstensible than something i could easily do in the
shell/awk/perl/c/etc and find that, over the long run (say more than 3
days)
i’ve found that my productivity increases in an exponential way if i
simply
embrace ruby’s power to write clear and re-usable code and code it right
‘the
first time.’ imho it’s a shame to write throw-away scripts in ruby.

here’s what i’ve got so far: the concept is that each line may contain
‘n’
columns of numbers, which is to say the input is not a simple list of
numbers,
but a list of rows of numbers: a table. any non-numeric data is
ignored,
eliminating the need to grep out crud. also, integer arithmitic is
attempted
where possible but the code falls back to floats when needed. all
numeric
input must be valid - no use of #to_i or #to_f, preferring Integer() and
Float(). the code abstracts all of the input, computation, and output
functions and is user-extensible via the use of duck-typed filters.
it’s also
usable both as a library or from the command-line

first some examples of usage:

mussel:~/eg/ruby/listc > cat input.a
1
2
3

mussel:~/eg/ruby/listc > ./listc sum < input.a
6

mussel:~/eg/ruby/listc > ./listc mean < input.a
2.0

mussel:~/eg/ruby/listc > cat input.b
1 2
3 4
5 6

mussel:~/eg/ruby/listc > ./listc median < input.b
3.0 4.0

mussel:~/eg/ruby/listc > cat input.c
foo 1 bar 2
a 3 b 4
x 5 y 6

mussel:~/eg/ruby/listc > ./listc minmax < input.c
1:5 2:6

mussel:~/eg/ruby/listc > ./listc min < input.c
1 2

mussel:~/eg/ruby/listc > ./listc max < input.c
5 6

mussel:~/eg/ruby/listc > cat input.d

  • elapsed : 770.1453289
  • elapsed : 620.9993257
  • elapsed : 1440.629573

mussel:~/eg/ruby/listc > ./listc mean < input.d
943.924742533333

now the code (i’m not golfing, for you non-vim users strange markers are
‘folds’: those lines appear as one single line to me):

mussel:~/eg/ruby/listc > cat ./listc
#! /usr/bin/env ruby

class Main
#–{{{
OPS = %w( sum add mean avg median max min minmax )

 def main
   op = ARGV.shift.to_s.strip.downcase

   klass =
     case op
       when 'sum', 'add'
         SumFilter
       when 'mean', 'avg'
         MeanFilter
       when 'median'
         MedianFilter
       when 'minmax'
         MinMaxFilter
       when 'max'
         MaxFilter
       when 'min'
         MinFilter
       else
         abort "bad op <#{ op }> not in <#{ OPS.join ',' }>"
     end

   filter = klass.new

   $stdin.each{|line| filter << line}

   filter.result >> $stdout
 end

#–}}}
end

def Main(*a, &b) Main.new(*a, &b).main end

module FilterUtils
#–{{{
def extract_numbers line
fields = line.strip.split(%r/\s+/)
fields.map{|f| Integer(f) rescue Float(f) rescue nil}.compact
end

 class List < Array
   def >> port = STDOUT
     port << join(' ')
     port << "\n"
   end
   def self.from other
     new.instance_eval{ replace other; self }
   end
 end
 def new_list l = nil
   l ? (List === l ? l : List.from(l)) : List.new
 end

 class MultiList < Array
   def >> port = STDOUT
     port << map{|elem| elem.join(':')}.join(' ')
     port << "\n"
   end
   def self.from other
     new.instance_eval{ replace other; self }
   end
 end
 def new_multilist ml = nil
   ml ? (MultiList === ml ? ml : MultiList.from(ml)) : MultiList.new
 end

#–}}}
end

class SumFilter
#–{{{
include FilterUtils
attr ‘sum’
def initialize
@sum = new_list
end
def << line
numbers = extract_numbers line
numbers.each_with_index do |n,i|
@sum[i] ||= 0
@sum[i] += n
end
end
def result
@sum
end
#–}}}
end

class MeanFilter
#–{{{
include FilterUtils
attr ‘sum’
attr ‘count’
def initialize
@sum = new_list
@count = new_list
end
def << line
numbers = extract_numbers line
numbers.each_with_index do |n,i|
@sum[i] ||= 0
@count[i] ||= 0
@sum[i] += n
@count[i] += 1
end
end
def result
mean = new_list
@sum.zip(@count){|s,c| mean << (s.to_f/c.to_f)}
mean
end
#–}}}
end

class MedianFilter
#–{{{
include FilterUtils
attr ‘min’
attr ‘max’
def initialize
@min = new_list
@max = new_list
end
def << line
numbers = extract_numbers line
numbers.each_with_index do |n,i|
@min[i] ||= n
@min[i] = [ @min[i], n ].min
@max[i] ||= n
@max[i] = [ @max[i], n ].max
end
end
def result
median = new_list
@min.zip(@max){|mi,ma| median << (mi + ((ma - mi)/2.0))}
median
end
#–}}}
end

class MinMaxFilter
#–{{{
include FilterUtils
attr ‘min’
attr ‘max’
def initialize
@minmax = new_multilist
end
def << line
numbers = extract_numbers line
numbers.each_with_index do |n,i|
@minmax[i] ||= [n,n]
@minmax[i][0] = [ @minmax[i][0], n ].min
@minmax[i][1] = [ @minmax[i][1], n ].max
end
end
def result
@minmax
end
#–}}}
end

class MinFilter < MinMaxFilter
#–{{{
def result
new_list @minmax.map{|minmax| [minmax.first]}
end
#–}}}
end

class MaxFilter < MinMaxFilter
#–{{{
def result
new_list @minmax.map{|minmax| [minmax.last]}
end
#–}}}
end

Main() if FILE == $0

of course this cod isn’t perfect, but if i’m going to spend time adding
a list
of numbers i’m going to put in at least this much effort.

kind regards.

-a

I’ll take a look to this NArray class, it seems pretty powerful ! I
don’t
understand how the examples on their website works, but the ‘image blur’
sample is exactly the kind of things I have to do for my project. Too
bad, I
don’t have time to restart from scratch… But I’ll keep it in mind :slight_smile:

– olivier

Michael F. wrote:

/ …

puts [ sum,mean,array.min,array.max ].map { |v| v.to_s }.join(’ ')

puts [ sum, mean, array.min, array.max ].join(’ ')

just cause join calls .to_s on every element anyway :wink:

Nice, thanks!

On Sun, 3 Dec 2006, Devin M. wrote:

[email protected] wrote:

harp:~> cat a.rb
Man, how do you keep all your a.rb’s straight?

#! /dmsp/reference/bin/ruby

require ‘narray’
require ‘yaml’
Am I blind, or do you require ‘yaml’ and never use it?

nope. you are note blind :wink: it used yaml in the past, but it’s rolled
by
hand now to presever order…

list = ARGF.readlines.map{|line| line.strip.split(%r/\s+/).map{|f| Float
f}}.flatten
Doesn’t this load the whole list of numbers into memory? (i.e. how does it
fare on “a billion values”?

yes. badly. most of our machines have 8gb of ram though - so i can
ignore
this for most of our stuff :wink: mostly this is a small hack which
levrages the
power of narray.

regards.

-a