On Fri, 27 Oct 2006, Paul L. wrote:
This problem is easily solved, and in a portable way. You create a list of
the files and their modification times, then sleep for some interval, then
wake up and compare the stored modification times with the new ones, also
test for any new files. Take action on any new or modified files. Maybe 25
lines of Ruby code.
you’d think - until your script stops, restarts, and you re-fire actions
for
all previous actions. if your action happens to have been something
like
system “something_which_should_only_happen_for_new_files #{ file }”
you’re screwed. that approach is simply not that much more durable that
cron’ing a script to process every file every minute since, logically
the
system can degrade to that.
i think a transactional db is an absolute requirement of such a system.
another, absoulute must, for such a system, is the ability to deal with
batches up updated files. the reason is that this:
while(true)
get_new_files
process_new_files
end
is terrifically flawed of 100,000 new files arrive at once - since it
requires
you to spawn 100,000 new processes. ideally the processing can be
batched in
chunks. dirwatch allows this by providing a config option to pass all
files to be processed to the script on stdin.
Be sure to sleep for an interval between tests, otherwise your script will
hog the processor.
also a flaw. if you simply sleep, say 200s, between loops you waste
time when
the actions you just took required more than that time. basically you
want to
ensure at least n seconds elapses between scans of the directory, but if
the
system is very busy you will not need to sleep since simply processing
may
require this amount of time.
in summary, having written three or four such systems and ultimately
arriving
at the code for dirwatch, which we’ve used in production for 24x7
satellite
ingest systems for several year, i can assure you the task isn’t quite
that
trivial.
regards.
-a