Output unique values in CSV columns to a text file

What I want to do is read in a CSV file and produce an output which
lists the unique values from each column in the following format:

Column: ColHeader1
UniqueVal1
UniqueVal2
Column: ColHeader2
UniqueVal1
UniqueVal2

What I’m currently getting is output that looks as follows:

Column: ColHeader1

ColHeader1UniqueVal1
ColHeader1UniqueVal2
Column: ColHeader2

ColHeader2UniqueVal1
ColHeader2UniqueVal2

For some reason, it is appending the column header to each value and
also printing a blank row to start each column. My code is below. Any
help is much appreciated. Essentially I read the CSV into a hash where
the key is the column header and the element is an array of values from
that column. I then run .uniq! on each array in the hash and print the
results to a file.

require ‘rubygems’
require ‘faster_csv’

infile = “xyz.csv”

uniques = {}

FCSV.open(infile, :headers => true).each do |row|
row.each_with_index do |element,j|
uniques[row.headers[j]] ||= []
uniques[row.header[j]] << element
end
end

uniques.each do |key,element|
element.uniq!
end

File.open(“unique_output.txt”,“w+”) do |out|
uniques.each_key do |key|
out.write “Column: #{key}\n”
uniques[key].each do |element|
out.write " #{element}\n"
end
end
end

On Dec 18, 2006, at 4:04 PM, Drew O. wrote:

What I want to do is read in a CSV file and produce an output which
lists the unique values from each column in the following format:

Column: ColHeader1
UniqueVal1
UniqueVal2
Column: ColHeader2
UniqueVal1
UniqueVal2

Well, if it all fits in memory it’s super easy using FCSV’s Tables:

#!/usr/bin/env ruby -w

require “rubygems”
require “faster_csv”

table = FCSV.parse(DATA.read, :headers => true)
table.by_col!.each do |header, col|
puts “#{header}:”
puts " #{col.uniq.join(’, ')}"
end

END
nums,letters
1,a
2,b
2,b
3,c
3,c
3,c

James Edward G. II

Drew O. wrote:

end
end


Posted via http://www.ruby-forum.com/.

data = DATA.readlines.map{|s| s.chomp.split(“,”)}
header = data.shift.map{|s| "Column: " + s}

data = data.transpose.map{|ary| ary.uniq.map{|s| " " + s} }

puts header.zip(data)

END
It’s,so,simple!
a,b,c
a,b,c
d,e,f