Merb uses autoload rather extensively. We have lately observed some disturbing behavior around concurrency. Effectively, because autoload removes the flag before loading the file, if two threads concurrently attempt to access an autoloaded constant, one of the threads can get a NameError. It's easy to reproduce this by adding a sleep to the file being loaded before the constant is defined, and then spinning up two threads that both try to use the constant. This is reproducible in MRI; it does not only happen with JRuby's true parallel execution. The worst part is that no user action can solve this problem: because autoloading is magic, it's impossible for the user to lock the resulting require. In effect, this makes autoloading completely useless for threaded environments. Is this intentional? Is there something we can do at the language level to ameliorate this problem?
on 2008-12-03 05:29
on 2008-12-03 05:33
This seems like a strong argument in favor of Ruby-core:20225. From: Yehuda Katz [mailto:wycats@gmail.com] Sent: Tuesday, December 02, 2008 8:23 PM To: ruby-core@ruby-lang.org Subject: [ruby-core:20235] autoload and concurrency Merb uses autoload rather extensively. We have lately observed some disturbing behavior around concurrency. Effectively, because autoload removes the flag before loading the file, if two threads concurrently attempt to access an autoloaded constant, one of the threads can get a NameError. It's easy to reproduce this by adding a sleep to the file being loaded before the constant is defined, and then spinning up two threads that both try to use the constant. This is reproducible in MRI; it does not only happen with JRuby's true parallel execution. The worst part is that no user action can solve this problem: because autoloading is magic, it's impossible for the user to lock the resulting require. In effect, this makes autoloading completely useless for threaded environments. Is this intentional? Is there something we can do at the language level to ameliorate this problem?
on 2008-12-03 05:52
Yehuda Katz wrote: > only happen with JRuby's true parallel execution. > > The worst part is that no user action can solve this problem: because > autoloading is magic, it's impossible for the user to lock the resulting > require. In effect, this makes autoloading completely useless for > threaded environments. Is this intentional? > > Is there something we can do at the language level to ameliorate this > problem? I've spent some time looking into this on Yehuda's behalf, and I believe there's no way to make this work without a behavioral change to autoload. Autoload is part of the normal constant lookup scheme. When defining an autoload, a special value is inserted into the target constant table with autoload file information. As constants are looked up, if one of these special values is encountered an autoload is triggered. Currently, the basic logic of autoload is like this: 1. The special value is removed from the constant table 2. The associated file is required, and presumably defines the constant 3. The constant is re-looked-up after the require completes The problem lies in step 1 here. There can be a gap between the time the special value is removed from the constant table and the time the required file redefines it. During this period, another thread may try to request the constant value. Since the autoload value has been removed and the new value has not yet replaced it, the constant search continues (and eventually fails with a NameError). The primary behavioral change needed would be to not remove the autoload value, or to replace it with a new marker indicating "autoload in progress". This would require changes to anything that looks up constants, so it would know not to initiate a new autoload require nor continue up the constant scope chain but to *block* until the autoload is complete. I believe this is the only way to make autoload threadsafe. I will also agree with Yehuda that a thread-unsafe autoload is broken, since it means the primary use case of autoload (delayed loading a file and definition of a constant) can't be used in the presence of threads. - Charlie
on 2008-12-03 05:58
Jim Deville wrote:
> This seems like a strong argument in favor of Ruby-core:20225.
Fixing autoload to call require doesn't solve anything because there's
still the exact same gap between the time it deletes the special value
and the time when the required file redefines it.
- Charlie
on 2008-12-03 06:04
Charles Oliver Nutter wrote: > Jim Deville wrote: >> This seems like a strong argument in favor of Ruby-core:20225. > > Fixing autoload to call require doesn't solve anything because there's > still the exact same gap between the time it deletes the special value > and the time when the required file redefines it. A trivial example autoloaded.rb: sleep 1 Bar::Foo = 1 autoloader.rb: module Bar autoload :Foo, 'autoloaded.rb' end t1 = Thread.new { Bar::Foo } t2 = Thread.new { Bar::Foo } t1.join; t2.join Fails every time. Of course "sleep 1" is a longer delay than you'd typically see from requiring in a file, but any delay means there's a change another thread will schedule and see the missing constant before it's defined. Now some of you may be saying "don't do that". But of course, the developer accessing the constant *doesn't know* not to do it, and the developer writing the library containing the autoload just wants to defer a require. Who is breaking the rules? - Charlie
on 2008-12-03 06:16
Also, this just illustrates that it's possible. In the case of Merb, we aren't doing any hanky panky, and still get the bad behavior from time to time under high concurrency on JRuby. -- Yehuda On Tue, Dec 2, 2008 at 8:58 PM, Charles Oliver Nutter <
on 2008-12-03 06:55
I think it has already been concluded that autoload and require are
inherently broken in presence of threads. See ruby-core:19860.
Or am I missing something?
Tomas
From: Yehuda Katz [mailto:wycats@gmail.com]
Sent: Tuesday, December 02, 2008 9:10 PM
To: ruby-core@ruby-lang.org
Subject: [ruby-core:20245] Re: autoload and concurrency
Also, this just illustrates that it's possible. In the case of Merb, we
aren't doing any hanky panky, and still get the bad behavior from time
to time under high concurrency on JRuby.
-- Yehuda
On Tue, Dec 2, 2008 at 8:58 PM, Charles Oliver Nutter
<charles.nutter@sun.com<mailto:charles.nutter@sun.com>> wrote:
Charles Oliver Nutter wrote:
Jim Deville wrote:
This seems like a strong argument in favor of Ruby-core:20225.
Fixing autoload to call require doesn't solve anything because there's
still the exact same gap between the time it deletes the special value
and the time when the required file redefines it.
A trivial example
autoloaded.rb:
sleep 1
Bar::Foo = 1
autoloader.rb:
module Bar
autoload :Foo, 'autoloaded.rb'
end
t1 = Thread.new { Bar::Foo }
t2 = Thread.new { Bar::Foo }
t1.join; t2.join
Fails every time. Of course "sleep 1" is a longer delay than you'd
typically see from requiring in a file, but any delay means there's a
change another thread will schedule and see the missing constant before
it's defined.
Now some of you may be saying "don't do that". But of course, the
developer accessing the constant *doesn't know* not to do it, and the
developer writing the library containing the autoload just wants to
defer a require. Who is breaking the rules?
- Charlie
on 2008-12-03 07:01
I was keying off of this: The worst part is that no user action can solve this problem: because autoloading is magic, it's impossible for the user to lock the resulting require. In effect, this makes autoloading completely useless for threaded environments. Is this intentional? However, I think the idea of a require plugin makes more sense. I'm assuming something like a callback like extend and include do, although I'm not sure if that could provide a lock without a shared data structure. JD
on 2008-12-03 07:16
On Dec 2, 2008, at 11:48 PM, Tomas Matousek wrote: > I think it has already been concluded that autoload and require are > inherently broken in presence of threads. See ruby-core:19860. > Or am I missing something? I don't think they're _inherently _. I just think the current implementation is. Some judicious use of monitors/mutexes would fix everything. Dave
on 2008-12-03 07:16
Require could be made safe if only one thread were allowed to execute requires at a time, globally. Whether that's a reasonable tradeoff, I'm not sure. All things are possible...but many require behavioral changes to Ruby. - Charlie
on 2008-12-03 09:06
I meant that even if autoload is "fixed" by adding some kind of "in-progress" flag it wouldn't make it thread-safe since "require" itself is still broken as you described. Tomas
on 2008-12-03 16:27
Except that my proposed fix for autoload would have all secondary threads either seeing the final loaded constant value or waiting on the in-progress load; there would be no concurrent call to require whatsoever.
on 2008-12-03 18:58
Charles Oliver Nutter wrote: >> threads that both try to use the constant. This is reproducible in > I've spent some time looking into this on Yehuda's behalf, and I > 1. The special value is removed from the constant table > The primary behavioral change needed would be to not remove the > autoload value, or to replace it with a new marker indicating > "autoload in progress". This would require changes to anything that > looks up constants, so it would know not to initiate a new autoload > require nor continue up the constant scope chain but to *block* until > the autoload is complete. I believe this is the only way to make > autoload threadsafe. The other wrinkle in this is that the autoload itself may make a check against that constant (perhaps this is uncommon). So the autoload thread and any child threads of the autoload thread should actually see it as undefined. > > I will also agree with Yehuda that a thread-unsafe autoload is broken, > since it means the primary use case of autoload (delayed loading a > file and definition of a constant) can't be used in the presence of > threads. Dare I say that merb and Rails 2.3 should reconsider autoload as a reasonable method for lazy loading. There has to be an explicit thread-safe way of lazy loading which does not look bad. -Tom
on 2008-12-03 19:23
+1 Why can't that "special value" be a kind of Mutex? Each thread tries to aquire it. The first succeeds, later threads block. As the later threads unblock, each redundantly "requires" the files needs for the constant, which have just been loaded by the first thread, so all the later threads quickly give up the Mutex and continue. - brent
on 2008-12-03 20:03
On Dec 3, 2008, at 1:17 PM, Brent Roman wrote: > quickly give > up the Mutex and continue. How do you prevent deadlocks?
on 2008-12-03 20:11
Unless the file that’s being auto-loaded requires another file via regular 'require' call. Then you might have two different constants in the system being accessed concurrently by 2 threads each auto-loading a file that requires other files. Tomas
on 2008-12-03 20:50
Because of this problem, we *will* be removing the use of autoload in 1.1. However, this is still a pretty big bug in Ruby itself (since it makes autoload broken). -- Yehuda
on 2008-12-03 23:08
Yes, I'd forgotten for a moment about the deadlock issue. Thanks for reminding me. One way to avoid the deadlocks might be to dedicate a thread to servicing all requests to load new code. Require and constant resolution with "autoload" enabled would be redefined to queue code load requests. The code "loader" thread would remove such requests from the queue, then load the required ruby files (including other files required by the requested file) while the requesting thread waited remained blocked. The loader thread would release the requesting thread only after its require request had been satisfied. This single code loader thread thus serializes the otherwise ill behaved concurrent Ruby code loading. There are some details I'm (knowingly) omitting. Does anyone see any truck sized holes in this general idea? - brent
on 2008-12-04 00:50
On Tue, Dec 2, 2008 at 11:10 PM, Charles Oliver Nutter <charles.nutter@sun.com> wrote: > Require could be made safe if only one thread were allowed to execute > requires at a time, globally. Whether that's a reasonable tradeoff, I'm not > sure. > > All things are possible...but many require behavioral changes to Ruby. > > - Charlie That might be a great idea. The only drawback being if you have file2.rb: sleep run this: Thread.new { sleep 1; require 'file1'} require 'file2' So if one thread is "happily running" within a require file [rails does this?] then other threads would be prohibited from requiring. Or I guess it could be mutex-ified per autoload [like require is currently for 1.9]--i.e. when a second thread loads the same constant, have it wait till the first's auto-load returns [somewhat of a band-aid, but...hmm this is hard]. -=R
on 2008-12-04 01:38
On Thu, 04 Dec 2008 06:58:54 +0900, Brent Roman wrote: > after > its require request had been satisfied. > > This single code loader thread thus serializes the otherwise ill behaved > concurrent Ruby code loading. > > There are some details I'm (knowingly) omitting. Does anyone see any > truck sized holes in this general idea? Loading code right now is a depth-first search of the dependency tree for the required file. Your suggestion would change it to a breadth first search. As a result, the following example would try to inherit from Bar before Bar was defined, something that is not the case now. If you extend that to Kernel#require and not just Kernel#autoload, as you say, then most currently existing Ruby code would break because of this. $ cat a.rb Kernel.autoload(:Foo,'b') Foo.something $ cat b.rb Kernel.autoload(:Bar,'c') class Foo < Bar def self.something puts "FizzBang" end end $cat c.rb class Bar end
on 2008-12-04 02:03
>> All things are possible...but many require behavioral changes to Ruby. >> >> - Charlie looks like this has been discussed before [on the todo list since 2000 [1]]. Another "band aid" style option might perhaps be to allow the 'load' file to know which autoload constant triggered it, then it could do its own concurrency. ex: autoload_new :ConstantName, 'thread_safe_require' then when autoload kicks in, it sets "something" [$something, whatever] to 'ConstantName' then 'thread_safe_require' would be able to hopefully load it sanely, if so desired. Perhaps create new functions load_thread_safe and/or maybe require_thread_safe :) Thoughts? -=R [1] http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...
on 2008-12-04 02:19
A stupid idea that just came to mind (maybe someone already mentioned it, I did not read all the messages): Wouldn't being able to have autoload managed by a callback a nice idea? It would allow the frameworks to implement it their own way... Just my 2 cents
on 2008-12-04 03:47
If the thread autoloading the file required another file, it's still the same thread.
on 2008-12-04 03:47
On Thu, Dec 4, 2008 at 1:14 AM, Yehuda Katz <wycats@gmail.com> wrote: > Because of this problem, we *will* be removing the use of autoload in 1.1. > However, this is still a pretty big bug in Ruby itself (since it makes > autoload broken). Why not use const_missing for lazy loading and protect "require" by a mutex (there could be a deadlock, if requires are nested, but in practice I was assuming it can be avoided)
on 2008-12-04 04:01
On Dec 3, 2008, at 8:40 PM, hemant wrote: > Why not use const_missing for lazy loading and protect "require" by a > mutex (there could be a deadlock, if requires are nested, but in > practice I was assuming it can be avoided) I don't see how there could be a deadlock: once you've claimed the mutex, your thread is the one that will do requires. Any requires nested inside the original one will be executed in the same thread, so simply rescue and ignore the recursive lock exception. Honestly, folks, I really don't see what all the fuss is about. One mutex, claimed by autoload, require, and load, will do the trick. If autoload claims it, then it gets to do the requires if it needs to, because they'll operate in-thread. If require claims it, then only requires and autoloads in that same thread will run until the original require finishes. Etc etc... No deadlock, and no contention. The current MRI implementation has a bug. Fix it with a mutex, and it all gets better. Then we find the next global thing that's not thread safe... :) Dave
on 2008-12-04 05:02
On Dec 3, 2008, at 8:55 PM, Dave Thomas wrote: > On Dec 3, 2008, at 8:40 PM, hemant wrote: > >> Why not use const_missing for lazy loading and protect "require" by a >> mutex (there could be a deadlock, if requires are nested, but in >> practice I was assuming it can be avoided) > > I don't see how there could be a deadlock: once you've claimed the > mutex, your thread is the one that will do requires. Any requires > nested inside the original one will be executed in the same thread, > so simply rescue and ignore the recursive lock exception. I loved your let's-move-on post so much it actually pains me to write this, but can we be totally sure of that? Thread.new do require "i_so_hope_im_wrong_about_this" end James Edward Gray II
on 2008-12-04 05:07
Brent Roman wrote: > There are some details I'm (knowingly) omitting. > Does anyone see any truck sized holes in this general idea? I don't think this is really any different than having a single mutex for all requires. The requiring thread would acquire it for the duration of the require. Any additional requires it runs into it would still have the lock for. Any other threads doing requires would block until that thread had come all the way back out. This implies a few things you should not do: * Don't do long or blocking operations during script loads * Don't launch threads that will do additional requires during script loads (or at least don't depend on them doing their requires until the launching thread has finished its own) This obviously means requiring becomes a single-threaded process, which is probably the only way to make it completely safe. - Charlie
on 2008-12-04 05:15
Dave Thomas wrote: > rescue and ignore the recursive lock exception. > > > Honestly, folks, I really don't see what all the fuss is about. One > mutex, claimed by autoload, require, and load, will do the trick. If > autoload claims it, then it gets to do the requires if it needs to, > because they'll operate in-thread. If require claims it, then only > requires and autoloads in that same thread will run until the original > require finishes. Etc etc... No deadlock, and no contention. autoload is a little trickier than that since it needs some intermediate state that isn't totally undefined but doesn't show up as yet-to-be-autoloaded either. But yeah, if we're willing to expect that require and autoload are one-thread-at-a-time, a lot of problems go away. I'd say load could potentially be unsynchronized, since it's more explicit and doesn't make any guarantees about whether it will fire twice, etc. But perhaps it should just be done across the board. - Charlie
on 2008-12-04 05:15
On Dec 3, 2008, at 9:55 PM, James Gray wrote: > Thread.new do > require "i_so_hope_im_wrong_about_this" > end That's a new thread, so the mutex.lock in its require will hang until any outer level one is done
on 2008-12-04 05:15
Roger Pack wrote: > autoload_new :ConstantName, 'thread_safe_require' > then when autoload kicks in, it sets "something" [$something, > whatever] to 'ConstantName' > then > 'thread_safe_require' would be able to hopefully load it sanely, if so desired. > > Perhaps create new functions load_thread_safe and/or maybe > require_thread_safe :) It would still require changes to autoload so that before autoload had started but after the autoload constant had been encountered another thread would not have any chance to see a totally empty constant. autoload either needs to leave the autoload marker there or put some other marker there. The ability to affect concurrency from within the autoloaded file hinges on the same requirement, as does a callback. - Charlie
on 2008-12-04 06:09
Charlie, Yes. I agree. There's really no significant difference between a single code loader thread and a single global (recursive) Mutex. The mutex, after all, contains the queue of waiting threads. It also occurs to me that if code loaded via "require" has absolutely no restrictions on what it can do (spawn threads, block indefinitely, side effect), then this problem is insoluble. Ruby isn't Erlang. It only achieves some degree of thread safety when used very carefully. I think the best we can hope is to create a mechanism that allows require and auto-load to work reliably in "well behaved" multi-threaded apps and to explain the restrictions placed on code being loaded via require for thread safety. - brent
on 2011-05-02 06:49
I know it happened a little more than a year of the last post, but I was checking that the use of autoload is still a issue with threads. Are there any plans to fix this?
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.