Heavy CSV importing

I am building out an app that will allow the user to import a data set
(as a CSV) from another application and view reports that will allow
them to make educated decisions. The most common task users will be
doing is importing CSV files.

An average CSV file will contain 1000 to 2500 rows and the system
will need to import approx. 50 CSV files per hour in the beginning and
could easily grow to 5000+ CSV files per hour as I intend on making
the basic plan w/ 60% of features available for free.

The CSV files will all be standardized. I don’t need to worry about
column variations or dirty data for version 1.0.

I am wondering is if FasterCSV is the right tool to use in this

In addition, there will be lots of data crunching for preparing the
reports. Is ROR even the right solution for this problem? I think it
is but what do you think?

The ar-extensions project is worth a look, as it can drastically speed
data loads in Rails. One very happy user wrote about it here:


The developer’s blog articles about it are here:


Beware, though, that some links, like those to the RDocs, aren’t working
the moment. I’ve written to the developer about that.


Thanks Craig. I’ll check out AR-Extensions. Anyone who has used AR-E
wanna comment with their thoughts?

I would think you’d use something like “spawn”


to push the import in the background and use Ajax (or even better
Comet) to
notify the user their import is completed. If you go with the
asynchronous model,
you don’t even really need spawn.

Under this or similar architecture you could do the import in native
loaders, C, ruby, rails; whatever suits your fancy. You could even
probably should) move the processing onto a different box. Web
servers should
be fulfilling HTTP requests; not messing with potentially long running
batch jobs.


Definitely consider a tool like ActiveWarehouse
and ActiveWarehouse ETL http://activewarehouse.rubyforge.org/etl/