Count distinct values in csv


#1

Hi,

i’m new to ruby.
How could i read a tab-delimted textfile with two columns + header
row
and count the distinct values from the 2.column?

my first experiments missed because i have to elimnate the header

row!?#######

filename = gets.chomp
text = String.new
File.open(filename) { |f| text = f.read }
values = text.split(/\t/)
freqs = Hash.new(0)
values.each { |values| freqs[values] += 1 }
freqs = freqs.sort_by {|x,y| y }
freqs.each {|values, freq| puts values+’ '+freq.to_s}

Many thanks for a starting point.
regards, christian


#2

Consider using either the built in CSV parser in Ruby stdlib or
FasterCSV ( http://fastercsv.rubyforge.org/ ). It’ll save you time in
the long run.

V/r
Anthony E.


#3

Christian Schulz wrote:

Hi,

i’m new to ruby.
How could i read a tab-delimted textfile with two columns + header
row
and count the distinct values from the 2.column?

Christian -

Strangely enough, I wrote a blog post about this very topic:

http://drewolson.wordpress.com/2007/03/13/csv-manipulation-w-ruby/

Here’s the code linked to by the blog post:

require ‘faster_csv’

unique_count = {}

FCSV.foreach(“myfile.csv”, :headers => true) do |row|
unique_count[row[1]] ||= 0
unique_count[row[1]] += 1
end

unique_count.each do |val,count|
puts “#{val} appreas #{count} time(s)”
end


#4

many thanks , ruby is really great and i have to learn think “easy”!
christian