FasterCSV - varying headers

Hello,

Quick warning: I am very much a ruby newbie and am extremely new to
programming in general.

I’m attempting to build a little program that operates on a large csv
file (potentially 100,000+ lines), but the challenge is that while I
will have a couple required columns, I must provide some naming
flexibility as it is unlikely that the user will be able to match my
headers word for word in every case. As such, my goal is to provide an
interface that asks what each header should represent and then treat the
user’s headers as if they followed my original specifications exactly.

For example, let’s say that I require the following columns: Product
Title, Product Price. If the user were to provide me with the headers
worded as Product Name and Product Pricing, I would want to assign
‘Product Name’ to represent ‘Product Title.’

I suspect that throwing the headers into a hash would be ideal, but
I’m not entirely sure how to approach it. Here an excerpt from my
attempt thus far…

require “rubygems”
require “fastercsv”

class HeaderProcessing
attr_accessor :file
attr_accessor :headers
attr_accessor :clientid
attr_accessor :product_title_header, :product_price_header

def initialize
puts “What is the client ID?”
@clientid = gets.chomp
open_file
end

def open_file
infile = “tobeprocessed/#{@clientid}.csv”
outfile = “tobeprocessed/#{@clientid}_out.csv”
csv = FasterCSV.read(infile, {:headers => true, :return_headers =>
true, :header_converters => :symbol})
# Not sure if read is the best approach here, since some files
could get quite large.
puts "The user’s headers are "
puts csv.headers.inspect
puts “\n \n Please enter the user supplied Product Title header”
@product_title_header = gets.chomp
puts “\n \n Please enter the user supplied Product Price”
@product_price_header = gets.chomp

I do this with each required and optional header. Not very DRY for

now…

I now have each of the user’s headers I intend to use in a number of

instance variables.

placeholder for user product data clean up

File.open(outfile, “w”) { |f| f.puts csv }
end
end
queued = HeaderProcessing.new

If I understand FasterCSV correctly, by setting :headers to true, the
csv file was read as a table object. Is it possible to turn the table’s
headers into a hash and then set each key/value to the appropriate
variable (as per @product_title_header etc)? If so, how? I’ve been
rummaging through the FasterCSV docs that I believe pertain to the
question, but I’m a bit lost on the actual implementation

Is it also feasible to save these hash definitions to a separate file so
that I won’t have to go through the same process when/if the user
provides a new file with updated prices? Alternatively, if there’s a
more appropriate way to tackle this, I’m all ears.

Thanks in advance!
Inf

On Thu, Oct 1, 2009 at 10:09 AM, Sean M. [email protected] wrote:

interface that asks what each header should represent and then treat the

true, :header_converters => :symbol})

I now have each of the user’s headers I intend to use in a number of

headers into a hash and then set each key/value to the appropriate
Inf

Posted via http://www.ruby-forum.com/.

I wrote a rails plugin which does this type of translation between
user supplied columns and expected columns. It is specific to Rails
but you might be able to get some ideas from it.

Andrew T.
http://ramblingsonrails.com

http://MyMvelope.com - The SIMPLE way to manage your savings

On Thu, Oct 1, 2009 at 3:56 PM, James Edward G. II
[email protected] wrote:

$ cat products.csv
Product Title,Product Price,Product Rating
Agricola,$55.99,4.5
Dominion,$35.99,5
Pandemic,$27.99,4.75

Good choices for the example (the ratings are over 5, right?) :slight_smile:

Jesus.

On Oct 1, 2009, at 9:27 AM, Jesús Gabriel y Galán wrote:

On Thu, Oct 1, 2009 at 3:56 PM, James Edward G. II
[email protected] wrote:

$ cat products.csv
Product Title,Product Price,Product Rating
Agricola,$55.99,4.5
Dominion,$35.99,5
Pandemic,$27.99,4.75

Good choices for the example (the ratings are over 5, right?) :slight_smile:

Absolutely. I’m glad someone appreciated the examples. :wink:

James Edward G. II

On Oct 1, 2009, at 3:09 AM, Sean M. wrote:

Hello,

Hello.

I’m attempting to build a little program that operates on a large csv
file (potentially 100,000+ lines), but the challenge is that while I
will have a couple required columns, I must provide some naming
flexibility as it is unlikely that the user will be able to match my
headers word for word in every case. As such, my goal is to provide an
interface that asks what each header should represent and then treat
the
user’s headers as if they followed my original specifications exactly.

Alternatively, if there’s a more appropriate way to tackle this, I’m
all ears.

I have some ideas.

First, let’s talk about the matching headers problem. Coming up with
everything a user might think of to type in sounds hard to me. What
if we showed the user which headers are available instead and had them
pick from a list? It seems like that would be easier and more accurate.

My other thought is that it looks like you are slurping the whole file
into memory just to write it all back out. Why don’t we just read a
line, fix it, write it out, and move on to the next line? That should
take less memory.

Here’s some example code combining these thoughts:

$ cat products.csv
Product Title,Product Price,Product Rating
Agricola,$55.99,4.5
Dominion,$35.99,5
Pandemic,$27.99,4.75
$ ruby csv_transfer.rb products.csv
1: Product Title
2: Product Price
3: Product Rating
d: Done

Column to include: 1
Added Product Title.
2: Product Price
3: Product Rating
d: Done

Column to include: 2
Added Product Price.
3: Product Rating
d: Done

Column to include: d
$ cat products_new.csv
Product Title,Product Price
Agricola,$55.99
Dominion,$35.99
Pandemic,$27.99
$ cat csv_transfer.rb
#!/usr/bin/env ruby -wKU

require “rubygems”
require “faster_csv”

file = ARGV.shift or abort “USAGE: #{$PROGRAM_NAME} CSV_FILE”
columns = [ ]
FCSV.open("#{File.basename(file, ‘.csv’)}_new.csv", “w”) do |csv|
FCSV.foreach(file, :headers => true) do |row|
# The following is a simple menu selection for columns.
if columns.empty?
loop do
choices = { }
row.headers.each_with_index do |column, i|
unless columns.include? column
n = i + 1
puts “#{n}: #{column}”
choices[n] = column
end
end
puts “d: Done”
puts
print "Column to include: "
choice = gets or break
if column = choices[choice.strip.to_i]
columns << column
puts “Added #{column}.”
elsif choice =~ /\Ad(?:one)?\Z/i
break
else
puts “Invalid column selection.”
end
end
if columns.empty?
puts “No columns selected.”
exit
end
csv << columns
end

   # Copy only the selected columns.
   csv << columns.map { |column| row[column] }
 end

end

END

Hope that helps.

James Edward G. II

Andrew: A rails version was definitely in the pipeline on this end, so
you will have saved me quite a bit of time. Thanks for sharing the
plugin!

James: I very much appreciate the assistance. I suspect I’ll learn
quite a bit as I experiment with the example code you’ve posted.
Thanks!

Regards,
S