Hello,
Quick warning: I am very much a ruby newbie and am extremely new to
programming in general.
I’m attempting to build a little program that operates on a large csv
file (potentially 100,000+ lines), but the challenge is that while I
will have a couple required columns, I must provide some naming
flexibility as it is unlikely that the user will be able to match my
headers word for word in every case. As such, my goal is to provide an
interface that asks what each header should represent and then treat the
user’s headers as if they followed my original specifications exactly.
For example, let’s say that I require the following columns: Product
Title, Product Price. If the user were to provide me with the headers
worded as Product Name and Product Pricing, I would want to assign
‘Product Name’ to represent ‘Product Title.’
I suspect that throwing the headers into a hash would be ideal, but
I’m not entirely sure how to approach it. Here an excerpt from my
attempt thus far…
require “rubygems”
require “fastercsv”
class HeaderProcessing
attr_accessor :file
attr_accessor :headers
attr_accessor :clientid
attr_accessor :product_title_header, :product_price_header
def initialize
puts “What is the client ID?”
@clientid = gets.chomp
open_file
end
def open_file
infile = “tobeprocessed/#{@clientid}.csv”
outfile = “tobeprocessed/#{@clientid}_out.csv”
csv = FasterCSV.read(infile, {:headers => true, :return_headers =>
true, :header_converters => :symbol})
# Not sure if read is the best approach here, since some files
could get quite large.
puts "The user’s headers are "
puts csv.headers.inspect
puts “\n \n Please enter the user supplied Product Title header”
@product_title_header = gets.chomp
puts “\n \n Please enter the user supplied Product Price”
@product_price_header = gets.chomp
I do this with each required and optional header. Not very DRY for
now…
I now have each of the user’s headers I intend to use in a number of
instance variables.
placeholder for user product data clean up
File.open(outfile, “w”) { |f| f.puts csv }
end
end
queued = HeaderProcessing.new
If I understand FasterCSV correctly, by setting :headers to true, the
csv file was read as a table object. Is it possible to turn the table’s
headers into a hash and then set each key/value to the appropriate
variable (as per @product_title_header etc)? If so, how? I’ve been
rummaging through the FasterCSV docs that I believe pertain to the
question, but I’m a bit lost on the actual implementation
Is it also feasible to save these hash definitions to a separate file so
that I won’t have to go through the same process when/if the user
provides a new file with updated prices? Alternatively, if there’s a
more appropriate way to tackle this, I’m all ears.
Thanks in advance!
Inf