Generate 50,000+ line CSV

Hi

I need to generate CSV dumps of lists of data from my db (mysql) some of
which number in excess of 50,000 rows. I have tried all of the ways I
can think of to a) get the data and b) generate the csv.

My best effort so far is to use Model.connection.select_all supplying
the appropriate sql to get the data into a hash and then iterate through
the hash building a csv string by appending each hash item onto a result
string with << This is still taking over 15 seconds.

I’d like to get this down below 2 seconds if possible. Is there
something I can do to achieve this with Rails is Rails not an
appropriate choice in this instance.

Thanks in advance…

Graham

You need to get away from ActiveRecord and access the database directly.
I
do believe that most DB engines support direct dumps to CSV. Check out
the
ActiveRecord extensions plugin (http://rubyforge.org/projects/arext/), I
believe it will help you here. Otherwise, you can always just send
straight
SQL and deal with the result as necessary.

Jason

On May 21, 2007, at 12:16 PM, Graham Pengelly wrote:

I need to generate CSV dumps of lists of data from my db (mysql)
some of which number in excess of 50,000 rows. I have tried all of
the ways I can think of to a) get the data and b) generate the csv.

Paginate. Get one of the more efficient pagination plugins, traverse
the pagination yourself and load N rows at a time, while writing the
CSV to either disk, or to send_file().

50.000 ActiveRecord objects in memory might not be the best of things.


Jordi

javier ramirez wrote:

I don’t know how fast it will be, but for sure better than before.

just out of curiosity i just tried to make 50000 inserts in my db (well,
they were actually 70K) then selecting them and making a .join("\n") to
get the string you’d need to put in your file and it takes a bit over 2
seconds (and we are talking windows here), so it should be below the 2
seconds you wanted in a linux or in a faster server (i’m running the
mysql on the same box as everything else so performance could definitely
be better)

If you want to go even faster, you can directly dump from mysql to file
with “SELECT INTO OUTFILE” but that would limit you to the server where
the db runs.

regards,

javier ramirez

Hi,

My best effort so far is to use Model.connection.select_all supplying
the appropriate sql to get the data into a hash and then iterate through
the hash building a csv string by appending each hash item onto a result
string with << This is still taking over 15 seconds.

Well… you can use the old approach of relying on your db when you need
to deal with large amounts of data, since dbms are pretty good for that.

You could do like this

ActiveRecord::Base.connection.select_all(“select concat(first_field ,
‘,’ , second_field,’,’,you_get_the_idea) as csv_row from your_table
where you_tell_me”)

So, that way you will get directly an array with a csv string on each
row. You just need to put it to your file and that’s it.

I don’t know how fast it will be, but for sure better than before.

Regards,

javier ramirez

Thanks everyone for your suggestions. I’ll run through them and post
back anything that comes to light.

Thanks again

Graham

On May 21, 2007, at 2:19 PM, javier ramirez wrote:

You could do like this

ActiveRecord::Base.connection.select_all(“select concat(first_field ,
‘,’ , second_field,’,’,you_get_the_idea) as csv_row from your_table
where you_tell_me”)

So, that way you will get directly an array with a csv string on each
row. You just need to put it to your file and that’s it.

Don’t forget CSV escaping of "


Jordi

As you would probably expect the fastest way from the suggestions so far
was to use SELECT … INTO OUTFILE with the query concatenating the
fields and commas and return the file with send_file.

I did get it reasonably fast with executing the sql containing CONCAT(
etc and joining the resultant array with “/n” but it was about twice as
fast to let the db do everything and then return the file via rails
(with the obvious problem that this can only take place on the db
server)

There is some mention in the MySql docs that using the command mysql -e
SELECT… > file_name would allow you to generate the file on the client
machine but I couldn’t seem to get that to work from Rails.

Thanks to everyone for your suggestions.

Nice one.

Graham

You could try and do something like
mysql -u user -pPassword -e “select concat(a,’,’,b) from table” db_name
–skip-column-names -s > output_file

Thanks Javier. That is the kind of thing I was trying to do but I have
settled for having the file generated on the db server, which is
currently the same machine anyway, and then I’ll add the extra
complexity when this becomes a problem.

Thanks again for taking the time to help.

Graham

Hi,

There is some mention in the MySql docs that using the command mysql -e
SELECT… > file_name would allow you to generate the file on the client
machine but I couldn’t seem to get that to work from Rails.

yeah, i didn’t tell you about the -e option because it’s a bit harder to
implement and maintain.

-e just tells mysql to execute the code you want, so you could do a
system call from ruby to path/mysql -e “whatever”. Problem here is that
you were outputting into a file, so with that syntax you will still
write on the server.

You could try and do something like
mysql -u user -pPassword -e “select concat(a,’,’,b) from table” db_name
–skip-column-names -s > output_file

which tells mysql to connect to the db with the user and password and
execute the select and output no column names and in silent mode
(meaning no characters to format the output as an ascii table), and then
you just redirect that output to a file.

That leaves you with a bunch of things to pass as parameters if you want
to do it right (path to mysql, path to the file, db_name, user,
password, which can be taken from the config loaded from database.yml)
and also is sensitive to permissions on the server for the output.

If writing the file directly on the server is something you prefer not
to do (you could always make a NFS entry so you can link to your machine
from your db server) and if you really need to optimize so you cannot go
with the select and concat and write, I’d say it’s the option to go
with. But if you really don’t need it, I’d say it’s the more complex.

regards,

javier ramírez