Ruby fs watcher?

Hi,

I wrote a script processing some files in a directory: each time there
is a new file in a given directory, I do something. So, my problem here
is to know when a new file arrived in the dir.
What I do now is to try to open the file in exclusive mode and do not
process the file if it fails, but polling just kills the cpu.

I am wondering if somebody could explain me an elegant notification
solution?

Thank you,
Jean-Etienne

On 10/26/06, Jean-Etienne D. [email protected] wrote:

This is fairly operating-system specific. Which one are you using?

On 10/26/06, Jean-Etienne D. [email protected] wrote:

Take a look at Ara’s dirwatch solution. Does exactly what you want.

http://raa.ruby-lang.org/project/dirwatch/

Blessings,
TwP

yes, unfortunately I am on windows

On Fri, Oct 27, 2006 at 03:10:42AM +0900, Jean-Etienne D. wrote:

Hi,

I wrote a script processing some files in a directory: each time there
is a new file in a given directory, I do something. So, my problem here
is to know when a new file arrived in the dir.
What I do now is to try to open the file in exclusive mode and do not
process the file if it fails, but polling just kills the cpu.

I am wondering if somebody could explain me an elegant notification
solution?

If you’re using Linux, you can use dnotify (and the newer ‘anotify’
where applicable). I have no idea how Windows would handle such a
thing, if at all.

– Thomas A.

On Fri, 27 Oct 2006, Jean-Etienne D. wrote:

Thank you,
Jean-Etienne

http://codeforpeople.com/lib/ruby/dirwatch/
http://codeforpeople.com/lib/ruby/dirwatch/dirwatch-0.9.0/README

-a

Thomas A. wrote:

solution?

If you’re using Linux, you can use dnotify (and the newer ‘anotify’
where applicable). I have no idea how Windows would handle such a
thing, if at all.

– Thomas A.


“Wanting to feel; to know what is real. Living is a lie.” – Purpoise
Song, by The Monkees.

You’d used WMI in Windows, potentially through the WIN32OLE library, to
monitor for file events. The FileSystemWatcher class in .NET is just a
wrapper around this functionality.

Jean-Etienne D. wrote:

Hi,

I wrote a script processing some files in a directory: each time there
is a new file in a given directory, I do something. So, my problem here
is to know when a new file arrived in the dir.
What I do now is to try to open the file in exclusive mode and do not
process the file if it fails, but polling just kills the cpu.

This problem is easily solved, and in a portable way. You create a list
of
the files and their modification times, then sleep for some interval,
then
wake up and compare the stored modification times with the new ones,
also
test for any new files. Take action on any new or modified files. Maybe
25
lines of Ruby code.

Be sure to sleep for an interval between tests, otherwise your script
will
hog the processor.

On Fri, 27 Oct 2006, Paul L. wrote:

This problem is easily solved, and in a portable way. You create a list of
the files and their modification times, then sleep for some interval, then
wake up and compare the stored modification times with the new ones, also
test for any new files. Take action on any new or modified files. Maybe 25
lines of Ruby code.

you’d think - until your script stops, restarts, and you re-fire actions
for
all previous actions. if your action happens to have been something
like

system “something_which_should_only_happen_for_new_files #{ file }”

you’re screwed. that approach is simply not that much more durable that
cron’ing a script to process every file every minute since, logically
the
system can degrade to that.

i think a transactional db is an absolute requirement of such a system.

another, absoulute must, for such a system, is the ability to deal with
batches up updated files. the reason is that this:

while(true)
get_new_files
process_new_files
end

is terrifically flawed of 100,000 new files arrive at once - since it
requires
you to spawn 100,000 new processes. ideally the processing can be
batched in
chunks. dirwatch allows this by providing a config option to pass all
files to be processed to the script on stdin.

Be sure to sleep for an interval between tests, otherwise your script will
hog the processor.

also a flaw. if you simply sleep, say 200s, between loops you waste
time when
the actions you just took required more than that time. basically you
want to
ensure at least n seconds elapses between scans of the directory, but if
the
system is very busy you will not need to sleep since simply processing
may
require this amount of time.

in summary, having written three or four such systems and ultimately
arriving
at the code for dirwatch, which we’ve used in production for 24x7
satellite
ingest systems for several year, i can assure you the task isn’t quite
that
trivial.

regards.

-a

Paul L. wrote:

This problem is easily solved, and in a portable way. You create a list of
the files and their modification times, then sleep for some interval, then
wake up and compare the stored modification times with the new ones, also
test for any new files. Take action on any new or modified files. Maybe 25
lines of Ruby code.

Be sure to sleep for an interval between tests, otherwise your script will
hog the processor.

You rather need the mtime of the directory and only that - at least if
you are interested in /new/ files only:

require ‘set’

last = nil
set = Set.new

loop do
current = File.mtime “.”
if last.nil? || last < current
s = Dir["*"].to_set
p s - set
set = s
last = current
end
sleep 1
end

Test run:

$ !ru
ruby /cygdrive/c/Temp/watch.rb &
[1] 1432

[email protected] ~
$ #<Set: {“xx”, “x”, “bin”, “a.1234”}>
touch foo

[email protected] ~
$ #<Set: {“foo”}>
touch bar

[email protected] ~
$ #<Set: {“bar”}>
touch bar

[email protected] ~
$ touch baz

[email protected] ~
$ #<Set: {“baz”}>

Kind regards

robert

On Fri, 27 Oct 2006, Paul L. wrote:

version. He may want to learn the programming aspects on his own, make his
own mistakes. From the content of his post, he didn’t bother to yield any
time during execution, therefore his script ate up the CPU and consequently
failed.

At that level, a simple solution really is simple.

i guess you’re right. i get defensive at the mere suggestion of simply
event
or cron based processing systems without mutual exclusion - i’ve
debugged way
too many of them!

cheers.

-a

[email protected] wrote:

/ …

in summary, having written three or four such systems and ultimately
arriving at the code for dirwatch, which we’ve used in production for 24x7
satellite ingest systems for several year, i can assure you the task isn’t
quite that trivial.

Yes, but the OP wants to know how to do it, not produce a mature, robust
version. He may want to learn the programming aspects on his own, make
his
own mistakes. From the content of his post, he didn’t bother to yield
any
time during execution, therefore his script ate up the CPU and
consequently
failed.

At that level, a simple solution really is simple.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs