Flat CSV to Relational DB

Anyone have any suggestions on pushing a flat CSV file into a
relational db? I am currently thinking of a solution working with
FasterCSV and ActiveRecord. I picture something like the following,
although I’m sure someone can make a better suggestion…

FasterCSV.read(“data.csv”, :headers => true).each do |row|
Product.create(
:name => row[“Product Name”],
:category => Category.find_or_create_by_name(row[“Product
Category”]),
:shipping_category =>
ShippingCategory.find_or_create_by_weight(row[“Weight”]),
:color => row[“Product Color”],
:description => row[“Product Description”]
)
end

My Product.create method call is HUGE, it keeps going and going…

Any suggestions?

TIA,
Michael G.

On Nov 8, 2006, at 12:52 PM, Michael G. wrote:

:shipping_category =>
ShippingCategory.find_or_create_by_weight(row[“Weight”]),
:color => row[“Product Color”],
:description => row[“Product Description”]
)
end

My Product.create method call is HUGE, it keeps going and going…

Any suggestions?

I have the following example in FasterCSV’s example directory:

#!/usr/local/bin/ruby -w

csv_rails_import.task

Created by James Edward G. II on 2006-11-05.

Copyright 2006 Gray Productions. All rights reserved.

namespace :my_app_name do
desc “Injects purchase.csv into the database.”
task :load_purchase => [:environment] do
require “#{RAILS_ROOT}/vendor/faster_csv/lib/faster_csv”

 purchase = Purchase.create!

 FCSV.foreach( "#{RAILS_ROOT}/db/questions.csv",
               :headers           => true,
               :header_converters => :symbol ) do |line|
   purchase.line_items.create!(line.to_hash)
 end

end
end

You’re needs are slightly more complex, but this makes a decent
starting point:

  1. Use :header_converters where you can
  2. Hash the line
  3. Edit the Hash as needed to create associated objects
  4. Hand-off to create

Hope that helps.

James Edward G. II

On 11/8/06, James Edward G. II [email protected] wrote:

:category => Category.find_or_create_by_name(row["Product
Any suggestions?
namespace :my_app_name do
end

Hope that helps.

Thanks, I was unaware of :header_converters, those look like they
could be very useful. I also was unaware of the ease at which I could
hash a row, very cool. Thanks for this great library!

James Edward G. II

Michael G.

On 11/8/06, Jeremy H. [email protected] wrote:

In this particular case I do not have a large quantity of data, though
I do appreciate the suggestions.

Many times I’ve wanted use ruby to parse and load CSV files into
databases, but for me there wass a data volume threshold. Many times
I’ve had to load hundreds of thousands to hundreds of millions of rows
into a db, and these days it is (for me) much easier to just have the
ruby script invoke the appropriate IMPORT or LOAD or whatever SQL
statement from the DB interface and pass it the CSV file to load.

Yeah, my particular liking of AR is that I can easily move pieces of
the data to the correct tables, setup all the relationships,
validations, and manipulations in Ruby.

Sometimes I’ve had to have ruby reprocess the csv file to get into the
quoted format that the db import command format, but in the overall
time, it was actually faster to process the CSV file and have the db
load it natively than to have ruby process the CSV file and do the
inserts via ActiveRecord or the DBI interface.

I do like this suggestion, do all the CSV manipulations, then pass off
to the appropriate DB command. I will have to investigate these
options further.

Thanks,
Michael G.

On Thu, Nov 09, 2006 at 03:52:07AM +0900, Michael G. wrote:

Anyone have any suggestions on pushing a flat CSV file into a
relational db? I am currently thinking of a solution working with
FasterCSV and ActiveRecord. I picture something like the following,
although I’m sure someone can make a better suggestion…

Depending on the quantity of data you may want to pick something other
than ruby to do this. Most databases (at least I know sqlite,
postgresql, mysql and db2 do) have a generic IMPORT or LOAD statement to
import from a delimited file; the most common of which being a CSV.

Many times I’ve wanted use ruby to parse and load CSV files into
databases, but for me there wass a data volume threshold. Many times
I’ve had to load hundreds of thousands to hundreds of millions of rows
into a db, and these days it is (for me) much easier to just have the
ruby script invoke the appropriate IMPORT or LOAD or whatever SQL
statement from the DB interface and pass it the CSV file to load.

Sometimes I’ve had to have ruby reprocess the csv file to get into the
quoted format that the db import command format, but in the overall
time, it was actually faster to process the CSV file and have the db
load it natively than to have ruby process the CSV file and do the
inserts via ActiveRecord or the DBI interface.

Just my opinion though :-).

enjoy,

-jeremy

On Fri, Nov 10, 2006 at 03:08:11AM +0900, Michael G. wrote:

On 11/8/06, Jeremy H. [email protected] wrote:

[…]

Sometimes I’ve had to have ruby reprocess the csv file to get into the
quoted format that the db import command format, but in the overall
time, it was actually faster to process the CSV file and have the db
load it natively than to have ruby process the CSV file and do the
inserts via ActiveRecord or the DBI interface.

I do like this suggestion, do all the CSV manipulations, then pass off
to the appropriate DB command. I will have to investigate these
options further.

Glad to help. Let us know how it turns out.

enjo,

-jeremy

Jeremy H. wrote:

On Fri, Nov 10, 2006 at 03:08:11AM +0900, Michael G. wrote:

On 11/8/06, Jeremy H. [email protected] wrote:

[…]

Sometimes I’ve had to have ruby reprocess the csv file to get into the
quoted format that the db import command format, but in the overall
time, it was actually faster to process the CSV file and have the db
load it natively than to have ruby process the CSV file and do the
inserts via ActiveRecord or the DBI interface.

I do like this suggestion, do all the CSV manipulations, then pass off
to the appropriate DB command. I will have to investigate these
options further.

Glad to help. Let us know how it turns out.

enjo,

-jeremy

Hi Jeremy:
Can you please elaborate on how you call natively load data infile from
a ruby file ? I really need good performance and I cannot get it as you
said by using active records or DBI.
The piece of code that is slow is this:

FCSV.foreach(ARGV[0], :headers => true) do |row|
table.insert(:call_date => row[0],
:caller_Id => row[1],
:phone_no=> row[2],
:destination => row[3],
:duration => row[4],
:call_cost => row[5])