Hello,
I see that FileUtils.rm_rf cannot handle a tree containing a
relative names longer than PATH_MAX.
These commands create a hierarchy a/0...0/0...0/...
where the name specifying the deepest directory has length 4097.
That is usually greater than PATH_MAX.
( mkdir a && cd a &&
for i in $(seq 16); do d=$(printf %0255d 0); mkdir $d && cd $d;
done )
This shows that rm_rf doesn't remove "a":
ruby -r fileutils -e 'FileUtils.rm_rf("a")'
test -d a && echo failed to remove a
It prints this:
failed to remove a
It is not at all trivial to fix this "properly".
By "properly," I mean in a way that rm_rf can remove an arbitrarily
deep hierarchy securely while remaining efficient and thread safe.
Modulo hard-coded diagnostics, the C implementation in the GNU coreutils
package (src/remove.c) should be appropriate.
Jim
on 2006-10-04 18:26
on 2006-10-05 11:20
Hi, At Thu, 5 Oct 2006 01:25:00 +0900, Jim Meyering wrote in [ruby-core:08999]: > It is not at all trivial to fix this "properly". > By "properly," I mean in a way that rm_rf can remove an arbitrarily > deep hierarchy securely while remaining efficient and thread safe. > Modulo hard-coded diagnostics, the C implementation in the GNU coreutils > package (src/remove.c) should be appropriate. It doesn't feel appropriate to chdir inside a library since it affects whole process.
on 2006-10-05 11:40
On Thu, 5 Oct 2006, Nobuyoshi Nakada wrote: > Hi, > > At Thu, 5 Oct 2006 01:25:00 +0900, > Jim Meyering wrote in [ruby-core:08999]: > > It is not at all trivial to fix this "properly". > > By "properly," I mean in a way that rm_rf can remove an arbitrarily > > deep hierarchy securely while remaining efficient and thread safe. > > Modulo hard-coded diagnostics, the C implementation in the GNU coreutils > > package (src/remove.c) should be appropriate. After a quick look, I couldn't figure that out, however: > > It doesn't feel appropriate to chdir inside a library since it affects > whole process. isn't it possible to pass a block to chdir, so that after executing the block one is back when one was? > > -- > Nobu Nakada > Hugh
on 2006-10-05 15:26
"Nobuyoshi Nakada" <nobu@ruby-lang.org> wrote: > At Thu, 5 Oct 2006 01:25:00 +0900, > Jim Meyering wrote in [ruby-core:08999]: >> It is not at all trivial to fix this "properly". >> By "properly," I mean in a way that rm_rf can remove an arbitrarily >> deep hierarchy securely while remaining efficient and thread safe. >> Modulo hard-coded diagnostics, the C implementation in the GNU coreutils >> package (src/remove.c) should be appropriate. > > It doesn't feel appropriate to chdir inside a library since it affects > whole process. Hello, You are right that calling chdir (or fchdir) is not appropriate in a library: it would render the caller thread-*un*safe. However, given sufficient O/S support, the implementation in coreutils/src/remove.c is indeed robust and thread-safe. As of coreutils-6.0 (the latest is coreutils-6.3), "rm -r" can remove an arbitrarily deep hierarchy in a thread-safe manner on a system with support for openat-like functions (Linux-2.6.16 and newer and Solaris 10). I have taken great pains to ensure that the code degrades gracefully, so that it works as well (but sacrifices thread safety) on systems that have neither openat nor sufficient /proc support. If you require thread safety even without openat support, then currently you must compromise on robustness: i.e., the code must once again be subject to the PATH_MAX limitation. However a robust, efficient, *and* always-thread-safe implementation is possible: if the PATH_MAX limitation is encountered, incur the cost of a single fork and then perform the remaining operations (including f/chdir calls) from a separate process. If you are interested, a viable alternative may involve using the fts implementation from the coreutils (the same one that's in gnulib). Then, once that version of fts has the proposed additional feature, ruby's rm_rf will "just work". FYI, fts is the file system traversing tool that is used by chmod, chgrp, chown, and du. It too takes advantage of openat, when possible, and degrades gracefully. However, it also has an option to make it use the existing approach of accessing each operand via its full, relative file name. The version if coreutils/gnulib was initially based on the one from *BSD and glibc, but I have changed its ABI slightly in order to make it work for arbitrarily deep hierarchies. For example, those programs can now process hierarchies a million levels deep or more. Jim
on 2006-10-05 16:18
Jim Meyering wrote: > to the PATH_MAX limitation. I'm not sure how much weight this will carry, but since we ship Ruby's libraries with JRuby we're hoping the same logic described above will be implementable in Java. Also ... > > However a robust, efficient, *and* always-thread-safe implementation > is possible: if the PATH_MAX limitation is encountered, incur the cost > of a single fork and then perform the remaining operations (including > f/chdir calls) from a separate process. > We would strongly prefer to avoid any implementation that requires fork, since we can't really support fork in JRuby. Also, wouldn't a fork preclude this method from working on Windows? Wouldn't it perhaps be better to support chdir at a per-thread level?
on 2006-10-05 17:51
Charles Oliver Nutter wrote:
...
> Wouldn't it perhaps be better to support chdir at a per-thread level?
Then ruby's thread scheduler would have to chdir for each context
switch. Is there any other reason not to do this? It's hard to see how
existing code could usefully depend on the working dir being global
rather than per-thread.
on 2006-10-05 17:53
On 10/5/06, Joel VanderWerf <vjoel@path.berkeley.edu> wrote: > Charles Oliver Nutter wrote: > ... > > Wouldn't it perhaps be better to support chdir at a per-thread level? > > Then ruby's thread scheduler would have to chdir for each context > switch. Is there any other reason not to do this? It's hard to see how > existing code could usefully depend on the working dir being global > rather than per-thread. I agree that it makes more sense for the current dir to be thread-specific, but I can't speak to the complexity of supporting this behavior in C Ruby. For JRuby, it would be a trivial change, since current directory is only emulated with a per-JRuby-runtime variable. We would simply move that variable into a per-thread context, and chdir would then be thread-safe.
on 2006-10-05 18:13
Hi, At Thu, 5 Oct 2006 22:26:24 +0900, Jim Meyering wrote in [ruby-core:09008]: > However, given sufficient O/S support, the implementation in > coreutils/src/remove.c is indeed robust and thread-safe. As of > coreutils-6.0 (the latest is coreutils-6.3), "rm -r" can remove an > arbitrarily deep hierarchy in a thread-safe manner on a system with > support for openat-like functions (Linux-2.6.16 and newer and Solaris 10). Thank you, I'll consider it later. > However a robust, efficient, *and* always-thread-safe implementation > is possible: if the PATH_MAX limitation is encountered, incur the cost > of a single fork and then perform the remaining operations (including > f/chdir calls) from a separate process. I thought about it too. Another idea suggested by akr is renaming too long path names to shorter one before traverse.
on 2006-10-06 14:43
[resend] Charles Oliver Nutter <Charles.O.Nutter@Sun.COM> wrote: > Jim Meyering wrote: ... >> However a robust, efficient, *and* always-thread-safe implementation >> is possible: if the PATH_MAX limitation is encountered, incur the cost >> of a single fork and then perform the remaining operations (including >> f/chdir calls) from a separate process. > > We would strongly prefer to avoid any implementation that requires fork, > since we can't really support fork in JRuby. Also, wouldn't a fork > preclude this method from working on Windows? With WOE, it wouldn't perform a "fork" per se. There, rm_rf could use "spawnvp" to execute a new command to handle the unusual event that it encounters the PATH_MAX limit. The gnulib execute module provides a portable way to do that: http://cvs.savannah.gnu.org/viewcvs/gnulib/lib/execute.c?root=gnulib&view=markup But I'm no Windows expert, so take this with a big grain of salt. > Wouldn't it perhaps be better to support chdir at a per-thread level? Do you know how to do that portably, so that it affects only rm_rf? What if some other concurrently-running code requires the process-wide semantics of chdir? So imagine that there is a new function with the thread-local semantics. Maybe... But is it available now?
on 2006-10-06 20:25
Jim Meyering wrote: > With WOE, it wouldn't perform a "fork" per se. > There, rm_rf could use "spawnvp" to execute a new command > to handle the unusual event that it encounters the PATH_MAX limit. > The gnulib execute module provides a portable way to do that: > http://cvs.savannah.gnu.org/viewcvs/gnulib/lib/execute.c?root=gnulib&view=markup > But I'm no Windows expert, so take this with a big grain of salt. Either way we wouldn't really be able to support it, and we'd have to hack our own version of FileUtils that doesn't spawn or fork anything :( > >> Wouldn't it perhaps be better to support chdir at a per-thread level? > > Do you know how to do that portably, so that it affects only > rm_rf? What if some other concurrently-running code requires > the process-wide semantics of chdir? So imagine that there is > a new function with the thread-local semantics. Maybe... > But is it available now? > I do not know of any such feature in the C domain, but that's not my area. We emulate chdir support in JRuby by keeping a separate variable for cwd. When operations that are directory-sensitive are called, we provide the cwd for them, normalizing paths manually as necessary. So far it has worked fairly well for us, and a thread-specific approach would a logical next step. In Java, we don't really even have the ability to chdir, so this emulation was the only safe way to support it.
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.