Hey Folks, Im looking for some suggestions here. Each day I need to process 200 GZ compressed files each day that contain comma delimited information in the following format: HostName, FileName, DirName, Modified_On Each file contains upwards of 200K rows and works out to be about 20MB when uncompressed. I need to compare this information to the information recieved the day before to look for files that have changed. My current plan was to: ## do read file and uncompress line do import line into Mysql using activerecord end end run several comparison queries to find changes save changes to a table automatically review changes based on rules, and those that are leftover are unauthorized changes and require research by the network admins. ## I've written everything, and its working. However, the MYSQL Import is SLOOOOW. Its taking 10 minutes per file. Extrapolating this, it will take 2000 minutes, or 33 hours each day to do just the import. Unfortunately, earth days only have 24 hours. So, I need some way to compare todays file, to yesterdays and see changes. I really want akk the data to end up in MySQL or SQL server, so I'm hesitant to use any solution that requires direct comparios on text files. Ideally, I'd like a way to speed up the import. I saw another post talking about how ActiveRecord defaults to using transactions for insert in a block, so the only thing ive tried at this point to speed things up is to begin the transaction outside of the block and end it upon compleetion. This didn't seem to help much... Thanks :) Brian
on 2006-03-23 15:00
on 2006-03-23 15:57
Brian C. wrote: > Hey Folks, > > Im looking for some suggestions here. Each day I need to process 200 GZ > compressed files each day that contain comma delimited information in > the following format: > HostName, FileName, DirName, Modified_On <snip> Check out LOAD DATA INFILE for MySQL. It's likely to be a lot faster than any other route, except possibly pre-parsing your CSV into SQL statements.
on 2007-08-24 10:01
> Brian C. wrote: >> Hey Folks, >> >> Im looking for some suggestions here. Each day I need to process 200 GZ >> compressed files each day that contain comma delimited information in >> the following format: >> HostName, FileName, DirName, Modified_On > <snip> Hello, hey I am having the same problem here, how did you work that out?