Making sure a file isn't being written/copied before moving

So, I’m (still) working on some scripts to rename and reorganize my
image
files.
Right now, I have a script on my laptop that pulls images off the CF
card
and stores them locally. I have another script that rsync’s the images
to
my server at home whenever there’s a connection available. These scripts
don’t step on each other because they look at the process list for
instances of rsync.

But I need to write two more scripts that take the image files from an
incoming directory, rename them, and drop them into a directory for me
to
work on them; then once I’ve done whatever I’m going to do (cull,
keyword,
etc), then I put them into a directory for archiving. The images are
going
to be pulled from that directory, put in an archive directory, and have
the immutable extended attribute (xattr) set.

My problem/issue is that I don’t want to do anything with a file that is
in the process of being moved into one of these directories. I have to
be
sure that the file is not still being moved/copied. Now, these
directories
are all on the same filesystem, so it should be an atomic change by
the
filesystem (technically a rename of the file from one directory/file
name
to another). But I want to be sure–plus I may be dropping files into
the
incoming directory from elsewhere from time to time. (I plan on going
through my backlog of old untagged files eventually.)

Can someone suggest an easy (or at least reliable) way to make sure
that
any file I’m about to modify isn’t being touched by another program?

Paul

Paul A. wrote:

Can someone suggest an easy (or at least reliable) way to make sure
that
any file I’m about to modify isn’t being touched by another program?

As you said yourself: rely on the atomic semantics of the filesystem.
Rename it to an extension which the other program will not recognise, or
into another directory which the other program won’t be looking in.

This is how Maildir works, so maybe reading up on the semantics of
Maildir will help you.

http://www.qmail.org/qmail-manual-html/man5/maildir.html

Tomorrow, Brian C. wrote:

Paul A. wrote:

Can someone suggest an easy (or at least reliable) way to make sure
that
any file I’m about to modify isn’t being touched by another program?

As you said yourself: rely on the atomic semantics of the filesystem.
Rename it to an extension which the other program will not recognise, or
into another directory which the other program won’t be looking in.

Perhaps you missed it in my original post (or perhaps I simply wasn’t
clear), but I may be operating on files that are added arbitrarily. My
concern is that the script starts acting on a file that is still being
copied. Renaming it won’t help there.

Paul

On Jul 16, 2009, at 4:48 PM, Paul A. wrote:

into another directory which the other program won’t be looking in.

Perhaps you missed it in my original post (or perhaps I simply
wasn’t clear), but I may be operating on files that are added
arbitrarily. My concern is that the script starts acting on a file
that is still being copied. Renaming it won’t help there.

Just use the renaming semantics in the first program (the one doing
the copying) also. Basically you are using the filename appearance as
a synchronization mechanism between the multiple processing steps.

If you can’t control the name of the file itself, then control the
directory in which it appears.

Gary W.

I think I solved my problem. I was looking at inotify in order to avoid
having the script have to check the directories on a regular basis.
Turns
out that it can report when a file is closed for writing, and return
the
path and basename of the file.
The only downside is that ruby-inotify doesn’t (as far as I can tell) do
recursive checks of the directory, so I’m using Open3 to call
inotifywait,
and parsing its output.

Here’s my test program:

#!/usr/bin/ruby -w

require ‘open3’
require ‘ftools’

def inwait(path)
Open3.popen3(“inotifywait -m -r #{path}”){ |stdin, stdout,
stderr|
while line = stdout.gets
next unless line.include?(“CLOSE_WRITE”)
yield line
end
}
end

inwait("/tmp") do |line|
path, action, file = line.split
puts “path: \t #{path}”
puts “action: \t #{action}”
puts “file: \t #{file}”
File.move(path+file, “/tmp”)
end

Paul

Tomorrow, Gary W. wrote:

Perhaps you missed it in my original post (or perhaps I simply wasn’t

That still leaves me with the same problem: I have to read out of a
directory. Plus, there isnt’ just going to be a script putting files in
my
incoming directory. I’ll be doing that myself as I clean up all my old
files.

Paul

Paul A. wrote:

Paul A. wrote:

Can someone suggest an easy (or at least reliable) way to make sure
that
any file I’m about to modify isn’t being touched by another program?

As you said yourself: rely on the atomic semantics of the filesystem.
Rename it to an extension which the other program will not recognise, or
into another directory which the other program won’t be looking in.

Perhaps you missed it in my original post (or perhaps I simply wasn’t
clear), but I may be operating on files that are added arbitrarily. My
concern is that the script starts acting on a file that is still being
copied. Renaming it won’t help there.

The program which drops files into the directory has to work the same
way:

  • open temporary file
  • write to it
  • close it
  • fsync if you want to be sure it’s on disk even if power is pulled
  • rename it to final location

That’s why I said to look at Maildir semantics - adding new E-mails to a
maildir works like this. (They are written into the tmp/ directory, and
then renamed into the new/ directory)

5:28pm, Brian C. wrote:

  • write to it
  • close it
  • fsync if you want to be sure it’s on disk even if power is pulled
  • rename it to final location

That’s why I said to look at Maildir semantics - adding new E-mails to a
maildir works like this. (They are written into the tmp/ directory, and
then renamed into the new/ directory)

I see what you’re saying. My issue was that I will be moving files into
this directory by hand as I go through my old, unmanaged digital images
and put them in this directory to be renamed and start the DAM (digital
asset management) process.

Of course, this is all moot now that I’ve found inotify will solve the
problem for me. Actually, it solves three problems:

  1. It blocks, so I don’t have to poll the directory.
  2. It lets me know when a file has been moved to or written in the
    directory (even if it’s in a subdirectory).
  3. It tells me the name of the file, so I don’t have to go out and find
    it.

Paul

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs