One of the things I’ve loved about rails is the ease with which you
can leverage your existing app code for shell/cli/scripting purposes
using ./script/console and ./script/runner. As such, I highly
recommend implementing your parsing/processing code from within your
rails app, versus having some separate ruby code that parses/processes
data and persists it directly in the db.
The major benefits of doing this is flexibility and d-r-y-ness of your
code by: leveraging all of your existing code/rules/etc for persisting
such data in the db; ability to easily test all of the pieces that
make up that processing just like any other part of your rails app;
ability to call such processing from both within your rails app via a
controller or via console or runner; easily perform such processing
against test, dev, or prod dbs; …
So, say the model ob you need to process data for is Foo, and the dir
that your client is uploading new data files to is found under your
proj root in ./private/newdata, and when a file is successfully
processed it is mv’d to ./private/processeddata, and you log
processing attempts in ./log/foo_processor.log, …:
in ./app/models/foo.rb
…
PROJ_DIR = File.expand_path("#{File.dirname(FILE)}/…/…")
NEWDATA_DIR = “#{PROJ_DIR}/private/newdata”
PROCESSEDDATA_DIR = “#{PROJ_DIR}/private/processeddata”
PROCESSOR_LOG = “#{PROJ_DIR}/log/foo_processor.log”
…
def Foo.process_data(somefile=nil, is_debug=false)
# if not somefile, grab list of un-processed NEWDATA_DIR
files, … and process data.
…
end
…
You could then call that class meth in some controller for uploading/
processing new data via your app:
in ./app/controller/some_such_controller.rb
…
def upload_newdata
…
# after saving successfully uploaded datafile in NEWDATA_DIR …
Foo.process_data(datafile_name)
…
end
or call it in some console session:
$ ./script/console development
…
Foo.process_data(‘some_datafile.txt’, true)
…
or call it from shell/cli via runner:
$ ./script/runner -e development ‘Foo.process_data
(“some_other_datafile.txt”, true)’
…
or call it via cron:
in appropriate crontab …
One of the things I’ve loved about rails is the ease with which you
can leverage your existing app code for shell/cli/scripting purposes
using ./script/console and ./script/runner. As such, I highly
recommend implementing your parsing/processing code from within your
rails app, versus having some separate ruby code that parses/processes
data and inserts it in the db directly.
The major benefits of doing this is flexibility and d-r-y-ness of your
code by: leveraging all of your existing code/rules/etc for persisting
such data in the db; ability to easily test all of the pieces that
make up that processing just like any other part of your rails app;
ability to call such processing from both within your rails app via a
controller or via console or runner; easily perform such processing
against test, dev, or prod dbs; …
So, say the model ob you need to process data for is Foo, and the dir
that your client is uploading new data files to is found under your
proj root in ./private/newdata, and when you successfully process some
file you mv it to ./private/processeddata, and you log all processing
attempts in ./log/foo_processor.log, …:
in ./app/models/foo.rb
…
PROJ_DIR = File.expand_path("#{File.dirname(FILE)}/…/…")
NEWDATA_DIR = “#{PROJ_DIR}/private/newdata”
PROCESSEDDATA_DIR = “#{PROJ_DIR}/private/processeddata”
PROCESSOR_LOG = “#{PROJ_DIR}/log/foo_processor.log”
…
def Foo.process_data(somefile=nil, is_debug=false)
# if not somefile, grab list of un-processed NEWDATA_DIR
files, … and process data.
…
end
…
Then you could call that class meth in some controller for uploading/
processing new data via your app:
in ./app/controller/some_such_controller.rb
…
def upload_newdata
…
# after saving successfully uploaded datafile in NEWDATA_DIR …
Foo.process_data(datafile_name)
…
end
or call it in some dev env console session:
$ ./script/console development
…
Foo.process_data(‘some_datafile.txt’, true)
…
or call it from shell/cli via runner:
$ ./script/runner -e development ‘Foo.process_data
(“some_other_datafile.txt”, true)’
…
or call it via cron:
in appropriate crontab …
…
at 2:03am every night, process all new datafiles in production env:
3 2 * * * appuser /path/to/proj/script/runner -e production
‘Foo.process_data’ 2>&1
…
Jeff