FileUtils.chdir thread safety

#1

Hi,

I’m getting this error: “warning: conflicting chdir during another
chdir block” when running the following code:
require ‘fileutils’
def blah(d)
FileUtils.cd d do
puts FileUtils.pwd
sleep 2
puts FileUtils.pwd
end
end
threads = []
threads << Thread.new { blah(’/home/motoct/dev/test1’) }
threads << Thread.new { blah(’/home/motoct/dev/test2’) }
threads.each { |t| t.join }

The second pwd of the first thread prints the current script directory
(where this code is located), so it is not just a warning!

I guess FileUtils.chdir is not thread safe. Is there any way you can
make it thread safe?

Thanks,
Tiberiu

#2

Mr_Tibs wrote:

I guess FileUtils.chdir is not thread safe. Is there any way you can
make it thread safe?

No, it calls the posix function to change the dir of the current
process, so it can never be made safe across threads.

JRuby emulates this behavior on a per-instance basis, but separate JRuby
instances in a given JVM can have their own current dirs. A per-thread
chdir seems like it would be a good idea, and easy to add to JRuby. I’d
support it…you ought to propose it on ruby-core.

  • Charlie
#3

Darn it! It would seem like a good idea.

Thanks Charlie.

#4

2008/9/9 Mr_Tibs removed_email_address@domain.invalid:

Darn it! It would seem like a good idea.

I am not convinced yet. Think about typical scenarios where one
thread creates tasks and hands them off to another thread. These will
break if every thread has its own idea of current dir. If you use
absolute pathnames you do not need a local current dir. This change
might even break existing code. I believe we should at least
carefully evaluate consequences.

Kind regards

robert

#5

Robert K. wrote:

2008/9/9 Mr_Tibs removed_email_address@domain.invalid:

Darn it! It would seem like a good idea.

I am not convinced yet. Think about typical scenarios where one
thread creates tasks and hands them off to another thread. These will
break if every thread has its own idea of current dir. If you use
absolute pathnames you do not need a local current dir. This change
might even break existing code. I believe we should at least
carefully evaluate consequences.

There’s no reason threads couldn’t inherit a reference to their parent
thread’s cwd, only using a specific one if provided (reverting back to
the parent’s cwd afterward).

It could even be made explicit:

Thread.chdir(‘foo’) {}

  • Charlie
#6

On Sep 9, 2008, at 1:46 PM, Joel VanderWerf wrote:

So in the case of

Thread.chdir(‘foo’) {puts ls}

the subprocess would be executed in foo ?

This would be very nice, but it seems like a major modification to
ruby… potentially every system call would need to be aware of the
thread cwd state.

this is already true. if you do

chdir(‘foo’){ puts ls }

the command will enter foo, run ls, and exit. the real issue is with

Thread.new do
chdir ‘foo’
puts ls
end

Thread.new do
chdir ‘foo’
puts ls
end

it’s a race condition currently, though i assume that’s what you meant
joel.

a thread local ‘chdir’ would need to affect backticks, system, exec,
Dir methods, FileUtils methods, etc. i guess the first thing
scheduling each thread would need to do is chdir to it’s own cwd
before running any code to be safe. this seems very expensive
doesn’t it?

it seems like an easier
a @ http://codeforpeople.com/

#7

Charles Oliver N. wrote:

There’s no reason threads couldn’t inherit a reference to their parent
thread’s cwd, only using a specific one if provided (reverting back to
the parent’s cwd afterward).

It could even be made explicit:

Thread.chdir(‘foo’) {}

So in the case of

Thread.chdir(‘foo’) {puts ls}

the subprocess would be executed in foo ?

This would be very nice, but it seems like a major modification to
ruby… potentially every system call would need to be aware of the
thread cwd state.

#8

ara.t.howard wrote:

ruby… potentially every system call would need to be aware of the
thread cwd state.

this is already true. if you do

chdir(‘foo’){ puts ls }

the command will enter foo, run ls, and exit. the real issue is with

I meant that system calls need to be aware of the current thread’s own
cwd state, as opp. the process cwd.

a thread local ‘chdir’ would need to affect backticks, system, exec, Dir
methods, FileUtils methods, etc. i guess the first thing scheduling
each thread would need to do is chdir to it’s own cwd before running any
code to be safe. this seems very expensive doesn’t it?

Yeah.

it seems like an easier

?

#9

On Sep 9, 2008, at 8:09 PM, Charles Oliver N. wrote:

We could ignore implementation details for the moment, couldn’t we?
It would be nice to have.

people seem to think so. i always use expand_path, but it would be a
cool feature.

what’s your thoughts on adding things to jruby that diverge from the
mri?

a @ http://codeforpeople.com/

#10

ara.t.howard wrote:

a thread local ‘chdir’ would need to affect backticks, system, exec, Dir
methods, FileUtils methods, etc. i guess the first thing scheduling
each thread would need to do is chdir to it’s own cwd before running any
code to be safe. this seems very expensive doesn’t it?

JRuby already supports multiple cwd in the same process by doing exactly
that, making all filesystem-aware operations use our cwd rather than
the process cwd. This is, for example, how you can run multiple Rails
apps in the same process, concurrently, without them stepping on each
other; they each get their own virtual cwd.

We could ignore implementation details for the moment, couldn’t we? It
would be nice to have.

  • Charlie
#11

ara.t.howard wrote:

what’s your thoughts on adding things to jruby that diverge from the mri?
There are already such things in JRuby, like the ‘jruby’ library you can
require to get access to the parser, the current JRuby instance, and the
“real” Java object representing any Ruby object (like the
org.jruby.RubyString that backs a String). And there’s a few items we’ve
added for compatibility with Rubinius libraries as well. In general if
we add anything that’s not standard, you have to require it, so it’s
more like an extension. That seems in line with MRI extensions we can’t
easily emulate or port yet.

As small as this seems, it could be a JRuby extension…

require ‘thread/chdir’

But I suppose there’s a slippery slope adding too many such features.

  • Charlie
#12

On Sep 9, 2008, at 9:34 PM, Charles Oliver N. wrote:

require ‘thread/chdir’

cool.

But I suppose there’s a slippery slope adding too many such features.

fortunately that’s your business and not mine :wink:

a @ http://codeforpeople.com/

#13

2008/9/9 Charles Oliver N. removed_email_address@domain.invalid:

might even break existing code. I believe we should at least
carefully evaluate consequences.

There’s no reason threads couldn’t inherit a reference to their parent
thread’s cwd, only using a specific one if provided (reverting back to the
parent’s cwd afterward).

What does inheriting change about my argument? Even with inheriting
you can send relative path names from one thread to another which do
not have the same cwd. My point is, that since safe programming
requires either transferring absolute paths or relying on having a
single cwd per process (which is traditionally the case) there is
little need for this.

Granted, for the Dir.chdir idiom it is desirable to be thread safe.
But how often is this used in a multithreaded environment and does the
cost (implementation, complexity, risk) outweigh the benefits?

Your situation with JRuby is a bit different since all the JRuby
instances in a JVM share nothing (apart from classes maybe). But they
are rather like separate processes and having them all inside the same
JVM is more an optimization.

Kind regards

robert

#14

On 10/09/2008, Robert K. removed_email_address@domain.invalid wrote:

break if every thread has its own idea of current dir. If you use
you can send relative path names from one thread to another which do
not have the same cwd. My point is, that since safe programming
requires either transferring absolute paths or relying on having a
single cwd per process (which is traditionally the case) there is
little need for this.

Yes, thread-local chdir sort of breaks some traditional semantics.
However, with multithreaded application you either do not chdir and
keep the same cwd everywhere, or you do chdir and then you must know
what you are doing which includes reading the docs on your chdir.

If you do not know what you are doing you will get into trouble one
way or another because it is not safe to assume that chdir will read
your mind and do exactly the right thing. As mentioned earlier there
are already problems with the current behaviour anyway.

Also this could possibly be an option - load thread/chdir and now you
can control if your thread cwds are bound together or not.

Thanks

Michal

#15

2008/9/10 Michal S. removed_email_address@domain.invalid:

Also this could possibly be an option - load thread/chdir and now you
can control if your thread cwds are bound together or not.

This is a good option. It also carries potential for subtle bugs if
it /redefines/ behavior of Dir.chdir. Maybe better have Thread.chdir
instead which delegates to Thread.current.chdir to keep both separate.

Cheers

robert

#16

Robert K. wrote:

2008/9/10 Michal S. removed_email_address@domain.invalid:

Also this could possibly be an option - load thread/chdir and now you
can control if your thread cwds are bound together or not.

This is a good option. It also carries potential for subtle bugs if
it /redefines/ behavior of Dir.chdir. Maybe better have Thread.chdir
instead which delegates to Thread.current.chdir to keep both separate.

I figured Thread.chdir would only operate on the current thread either
way, like Thread.stop. And I’d rather not make it possible to change the
dir for another thread…that seems like a dangerous idea.

  • Charlie
#17

On 10.09.2008 21:10, Charles Oliver N. wrote:

dir for another thread…that seems like a dangerous idea.
Yep, good point!

robert

#18

On 10/09/2008, Charles Oliver N. removed_email_address@domain.invalid wrote:

instead which delegates to Thread.current.chdir to keep both separate.

I figured Thread.chdir would only operate on the current thread either way,
like Thread.stop. And I’d rather not make it possible to change the dir for
another thread…that seems like a dangerous idea.

If you allow global chdir that’s technically chnging the cwd for other
threads.

I had a vague idea that there could be an option that controls if cwd
should be emulated ( saved/restored on thread switch or restored for
syscalls that need it) or the global cwd should be used.

Thanks

Michal

#19

Michal S. wrote:

If you allow global chdir that’s technically chnging the cwd for other threads.

I had a vague idea that there could be an option that controls if cwd
should be emulated ( saved/restored on thread switch or restored for
syscalls that need it) or the global cwd should be used.

Changing it on thread switch would probably be necessary for MRI, but
not really necessary or possible on JRuby since threads run in parallel.
But by the same token, I’m not certain any JRuby support for per-thread
CWD would be able to handle calls out to ‘ls’ and friends, since even
the best fakery we can do to make File APIs have a per-thread CWD would
still not affect the system calls used to launch external processes…

  • Charlie
#20

Charles Oliver N. wrote:

But by the same token, I’m not certain any JRuby support for per-thread
CWD would be able to handle calls out to ‘ls’ and friends, since even
the best fakery we can do to make File APIs have a per-thread CWD would
still not affect the system calls used to launch external processes…

Actually I suppose there is a way; have all process launches acquire a
lock during the process launch, so they can atomically chdir for the
process quickly and then chdir back, letting the child run. Kinda hacky
though, and I’m not sure what effect process-level chdir would have on
the JVM.

  • Charlie