Updating database via cron?

bongoman · February 23, 2009, 3:46am

Hi there

I have a Rails app running that needs to have it’s database
periodically updated from an upload of a text file.

The data file is in UIEE format and I’m working on a Ruby script that
parses the file and inserts the data into MySQL. That should be OK I
think at this stage.

However I need to automate this process - basically the client is
uploading the UIEE file to a prescribed directory on the remote
server. I then either need to detect whether there has been a fresh
upload OR rely on cron to look into the directory and then parse the
fresh file (if present).

Am I on the right track looking at cron to do this? Maybe I’m better
building an admin page where the user can manually trigger the running
of the database update script? Or is there another unix command that
can detect a change in a directory and thereby trigger the script?

Any clues on how best to approach this situation would be
appreciated…

bongoman · February 23, 2009, 7:03am

Cron is a fine approach if you want an action to be based on time.
If you want some action to be based on some user action, just
redirect_to it after the file is uploaded.
If you’re using Cron, it’s likely best to use a rake task. Quite
easy, if you haven’t done it before. Much like writing a little
controller. This gives you access to the Rails stack without dealing
with your web server.
You can call the rake task from your crontab file. Don’t forget to
use absolute paths and a subshell. I do it like this (after the time
declaration):
system_user_name (cd /path/to/rails/app; /usr/bin/rake rake_task_name)
I’d be looking at the File class if you’re having trouble with your
particular file encoding.
Worst case there is that you have to use “system
some_unix_utility_that_will_convert_your_file some_arguments” to do
some converting before you open the file using Ruby.
The rest should just be string manipulation and normal record creation
the Rails way.

bongoman · February 24, 2009, 7:43am

One of the things I’ve loved about rails is the ease with which you
can leverage your existing app code for shell/cli/scripting purposes
using ./script/console and ./script/runner. As such, I highly
recommend implementing your parsing/processing code from within your
rails app, versus having some separate ruby code that parses/processes
data and persists it directly in the db.

The major benefits of doing this is flexibility and d-r-y-ness of your
code by: leveraging all of your existing code/rules/etc for persisting
such data in the db; ability to easily test all of the pieces that
make up that processing just like any other part of your rails app;
ability to call such processing from both within your rails app via a
controller or via console or runner; easily perform such processing
against test, dev, or prod dbs; …

So, say the model ob you need to process data for is Foo, and the dir
that your client is uploading new data files to is found under your
proj root in ./private/newdata, and when a file is successfully
processed it is mv’d to ./private/processeddata, and you log
processing attempts in ./log/foo_processor.log, …:

in ./app/models/foo.rb

…
PROJ_DIR = File.expand_path("#{File.dirname(FILE)}/…/…")
NEWDATA_DIR = “#{PROJ_DIR}/private/newdata”
PROCESSEDDATA_DIR = “#{PROJ_DIR}/private/processeddata”
PROCESSOR_LOG = “#{PROJ_DIR}/log/foo_processor.log”
…

def Foo.process_data(somefile=nil, is_debug=false)
# if not somefile, grab list of un-processed NEWDATA_DIR
files, … and process data.
…
end
…

You could then call that class meth in some controller for uploading/
processing new data via your app:

in ./app/controller/some_such_controller.rb

…
def upload_newdata
…
# after saving successfully uploaded datafile in NEWDATA_DIR …
Foo.process_data(datafile_name)
…
end

or call it in some console session:

$ ./script/console development
…

Foo.process_data(‘some_datafile.txt’, true)
…

or call it from shell/cli via runner:

$ ./script/runner -e development ‘Foo.process_data
(“some_other_datafile.txt”, true)’
…

or call it via cron:

in appropriate crontab …

One of the things I’ve loved about rails is the ease with which you

can leverage your existing app code for shell/cli/scripting purposes
using ./script/console and ./script/runner. As such, I highly
recommend implementing your parsing/processing code from within your
rails app, versus having some separate ruby code that parses/processes
data and inserts it in the db directly.

The major benefits of doing this is flexibility and d-r-y-ness of your
code by: leveraging all of your existing code/rules/etc for persisting
such data in the db; ability to easily test all of the pieces that
make up that processing just like any other part of your rails app;
ability to call such processing from both within your rails app via a
controller or via console or runner; easily perform such processing
against test, dev, or prod dbs; …

So, say the model ob you need to process data for is Foo, and the dir
that your client is uploading new data files to is found under your
proj root in ./private/newdata, and when you successfully process some
file you mv it to ./private/processeddata, and you log all processing
attempts in ./log/foo_processor.log, …:

in ./app/models/foo.rb

…
PROJ_DIR = File.expand_path("#{File.dirname(FILE)}/…/…")
NEWDATA_DIR = “#{PROJ_DIR}/private/newdata”
PROCESSEDDATA_DIR = “#{PROJ_DIR}/private/processeddata”
PROCESSOR_LOG = “#{PROJ_DIR}/log/foo_processor.log”
…

def Foo.process_data(somefile=nil, is_debug=false)
# if not somefile, grab list of un-processed NEWDATA_DIR
files, … and process data.
…
end
…

Then you could call that class meth in some controller for uploading/
processing new data via your app:

in ./app/controller/some_such_controller.rb

…
def upload_newdata
…
# after saving successfully uploaded datafile in NEWDATA_DIR …
Foo.process_data(datafile_name)
…
end

or call it in some dev env console session:

$ ./script/console development
…

Foo.process_data(‘some_datafile.txt’, true)
…

or call it from shell/cli via runner:

$ ./script/runner -e development ‘Foo.process_data
(“some_other_datafile.txt”, true)’
…

or call it via cron:

in appropriate crontab …

…

at 2:03am every night, process all new datafiles in production env:

3 2 * * * appuser /path/to/proj/script/runner -e production
‘Foo.process_data’ 2>&1
…

Jeff

bongoman · February 24, 2009, 7:52am

(removed/reposted)

One of the things I’ve loved about rails is the ease with which you
can leverage your existing app code for shell/cli/scripting purposes
using ./script/console and ./script/runner. As such, I highly
recommend implementing your parsing/processing code from within your
rails app, versus having some separate ruby code that parses/processes
data and persists it directly in the db.

The major benefits of doing this is flexibility and d-r-y-ness of your
code by: leveraging all of your existing code/rules/etc for persisting
such data in the db; ability to easily test all of the pieces that
make up that processing just like any other part of your rails app;
ability to call such processing from both within your rails app via a
controller or via console or runner; easily perform such processing
against test, dev, or prod dbs; …

So, say the model ob you need to process data for is Foo, and the dir
that your client is uploading new data files to is found under your
proj root in ./private/newdata, and when a file is successfully
processed it is mv’d to ./private/processeddata, and you log
processing attempts in ./log/foo_processor.log, …:

in ./app/models/foo.rb

…
PROJ_DIR = File.expand_path("#{File.dirname(FILE)}/…/…")
NEWDATA_DIR = “#{PROJ_DIR}/private/newdata”
PROCESSEDDATA_DIR = “#{PROJ_DIR}/private/processeddata”
PROCESSOR_LOG = “#{PROJ_DIR}/log/foo_processor.log”
…

def Foo.process_data(somefile=nil, is_debug=false)
# if not somefile, grab list of un-processed NEWDATA_DIR files,
# … and process data …
…
end
…

You could then call that class meth in some controller for uploading/
processing new data via your app:

in ./app/controller/some_such_controller.rb

…

def upload_newdata
…
# after saving successfully uploaded datafile in NEWDATA_DIR …
Foo.process_data(datafile_name)
…
end

or call it in some console session:

$ ./script/console development
…

Foo.process_data(‘some_datafile.txt’, true)
…

or call it from shell/cli via runner:

$ ./script/runner -e development ‘Foo.process_data
(“some_other_datafile.txt”, true)’
…

or call it via cron:

in appropriate crontab …

…

at 2:03am every night, process all new datafiles in production env:

3 2 * * * appuser /path/to/proj/script/runner -e production
‘Foo.process_data’ 2>&1
…

Jeff