Handling Large Data Imports


#1

Greetings.

Have any of you guys dealt with importing large amounts of data on
deploy? I have two distinct situations I need to deal with and I was
hoping for some advice.

  1. I have a fairly large CSV file of postal codes that seems to
    import very well with the MySQL import utility (mysqlimport) on
    development machines but very poorly when used remotely with
    Capistrano (i.e. during initial migration). How are you guys handling
    such data import on initial migration?

  2. I’m going to have to move over a rather large amount of data from
    an older version of the site I’m updating (old site is written in ASP
    using a MS SQL database – it’s a mess). I know I can write scripts
    to massage data, but something I’m unsure of is how to actually move
    the data from the old servers to the new servers – neither of which
    I’ll have physical access to. Recommendations?

Thanks!

James H


#2

Can you give some performance #s on the remote side? Is that remote
database under load during the import?

Jesse P.
Blue Box Group, LLC

p. +1.800.613.4305 x801
e. removed_email_address@domain.invalid


#3

Hi James,

Different situations require different techniques.

if you are looking for ruby based dumping use fastercsv library

or plainly you can use mysqldump of the table or db itself from the
development DB itself and upload and dump it into production server.

I have done it few times myself with a million+ records

these scripts do take significant amount of time and we do it only
once in a while so I wont suggest CAP as script may timeout

regards
Senthil