Are all built-in objects thread safe? For example, if I have an array
and one thread is constant appending to it while another thread is
shifting
elements off of it and there's no synchronization going on, can the
array
object ever get corrupted? What about a similar scenario for hashes?
These
are surely complicated objects with internal state that must be
maintained.
Are they implemented to be thread safe?
Thank you...
on 2008-12-23 10:30
on 2008-12-23 16:05
Hi,
In message "Re: Are all Ruby built-in objects thread safe?"
on Tue, 23 Dec 2008 18:21:27 +0900, "Just Another Victim of the
Ambient Morality" <ihatespam@hotmail.com> writes:
| Are all built-in objects thread safe?
For MRI (1.8.x) and YARV (1.9.x), every C implemented methods are
protected by GIL (Global Interpreter Lock), so that you don't have to
worry about. But it might depend on each implementation.
matz.
on 2008-12-23 16:25
On 23.12.2008 10:24, Just Another Victim of the Ambient Morality wrote: > Are all built-in objects thread safe? For example, if I have an array > and one thread is constant appending to it while another thread is shifting > elements off of it and there's no synchronization going on, can the array > object ever get corrupted? What about a similar scenario for hashes? These > are surely complicated objects with internal state that must be maintained. > Are they implemented to be thread safe? The answer will sound a bit odd: they are not built to be thread safe but it may well be that they are. In fact, it may well depend whether they are thread safe in practical terms depending on the Ruby version you are using. Since the classic interpreter uses green threads, i.e. no preemption is happening, it may be that some or all operations are atomic and thus thread safe. The bottom line is: you should not rely on this and assume they are not thread safe! In your example, you should be using class Queue which is thread safe and can be imported by requiring "thread" $ irb -r thread irb(main):001:0> q = Queue.new => #<Queue:0x7ff6afc0> irb(main):002:0> t = Thread.new(q) {|qq| while (o = qq.deq) != qq; p o; end} => #<Thread:0x7ff61c68 sleep> irb(main):003:0> 5.times {|i| q.enq i}; q.enq q; t.join 0 1 2 3 4 => #<Thread:0x7ff61c68 dead> Then there are also classes Mutex and Monitor plus module MonitorMixin. The difference is reentrance: irb(main):009:0> require 'monitor' => true irb(main):010:0> m=Mutex.new => #<Mutex:0x7ff3e100> irb(main):011:0> m.synchronize { m.synchronize { 1 } } ThreadError: stopping only thread note: use sleep to stop forever from (irb):11:in `synchronize' from (irb):11 from (irb):11:in `synchronize' from (irb):11 from (null):0 irb(main):012:0> m=Monitor.new => #<Monitor:0x7ff329f4 @mon_count=0, @mon_owner=nil, @mon_waiting_queue=[], @mon_entering_queue=[]> irb(main):013:0> m.synchronize { m.synchronize { 1 } } => 1 irb(main):014:0> Kind regards robert
on 2008-12-23 22:46
Just Another Victim of the Ambient Morality wrote: > Are all built-in objects thread safe? For example, if I have an array > and one thread is constant appending to it while another thread is shifting > elements off of it and there's no synchronization going on, can the array > object ever get corrupted? What about a similar scenario for hashes? These > are surely complicated objects with internal state that must be maintained. > Are they implemented to be thread safe? This is a *very* interesting question! And it is a question that can ultimately *only* be answered by a formal Ruby Specification or more specifically a formal Ruby Memory Model. Until we have such a specification, the C source code of MRI or YARV is considered to be the "specfication". However, there is a problem: that source code can actually be interpreted several different ways. If you look at the implementations of Hash, Array and friends, you will see that they are not thread-safe. Ergo: the specification says that the user is responsible for locking Arrays and Hashes. If, however, you look at the implementation of threads, you will see that both MRI and YARV are actually incapable of running more than one thread at a time -- even on a 1000-core machine MRI and YARV will only ever use one core. So, since two threads can never access an Array at the same time, there is no need for locking. Ergo: the specification says that the user is *not* responsible for locking Arrays and Hashes. There is a conflict here -- on the one hand, Arrays aren't thread-safe, on the other hand, MRI's broken threading implementation accidentally *makes* them thread-safe. Which do you depend on? As it turns out, different people interpret this differently. A couple of months ago, this actually became an issue. Originally, the JRuby developers had implemented Arrays to be not safe. One of the big selling points of JRuby was and still is the promise of true concurrency and better scalability. So, naturally, people wanted to take advantage of this feature and started running their concurrent programs on JRuby. And those programs crashed left and right, because they didn't lock their Arrays properly. So, the JRuby team decided to implement thread-safe data structures on their end, so that code that didn't crash on MRI could be run unmodified on JRuby. However, they didn't *have* to do that. They could just as well have concluded that those programs were broken and *they* needed to become thread-safe. That would have been perfectly acceptable. And there is no guarantee that *all* Ruby Implementations will do it that way (and there's lots of them, something around 14 or so at the moment). Well, *unless* of course, there is a specification which tells them to. So, in short: when in doubt, lock. jwm
on 2008-12-23 23:37
Yukihiro Matsumoto wrote: > Hi, > > In message "Re: Are all Ruby built-in objects thread safe?" > on Tue, 23 Dec 2008 18:21:27 +0900, "Just Another Victim of the Ambient Morality" <ihatespam@hotmail.com> writes: > > | Are all built-in objects thread safe? > > For MRI (1.8.x) and YARV (1.9.x), every C implemented methods are > protected by GIL (Global Interpreter Lock), so that you don't have to > worry about. But it might depend on each implementation. In JRuby we've made a best attempt to keep the core data structures like Array, Hash, and String as thread-safe as possible without introducing locks (which would slow down single-threaded cases). There are cases where when using Array or Hash you may get ConcurrencyErrors raised. This is the trade-off for having fast lock-free collections. I would very much welcome a set of guaranteed-thread-safe wrapper collections folks could use if they're concerned about concurrency, since it would be unreasonable to penalize all code with locking. For now, learn and love Mutex, or don't share collections across threads. - Charlie
on 2008-12-23 23:44
Jörg W Mittag wrote: > A couple of months ago, this actually became an issue. Originally, the > JRuby developers had implemented Arrays to be not safe. One of the big > selling points of JRuby was and still is the promise of true > concurrency and better scalability. So, naturally, people wanted to > take advantage of this feature and started running their concurrent > programs on JRuby. And those programs crashed left and right, because > they didn't lock their Arrays properly. So, the JRuby team decided to > implement thread-safe data structures on their end, so that code that > didn't crash on MRI could be run unmodified on JRuby. Actually, we made a minimal attempt to ensure that concurrent operations against Array were usually safe across threads but also would raise a Ruby-land "ConcurrencyError" when concurrent changes could not be reconciled. It's a trade-off; adding locks to all the core collections would severely penalize performance for what's generally the rare case of concurrent access. And really there should be a separate set of classes with guaranteed concurrency that people can use if the performance considerations of locking are acceptabe for safe concurrent access. So in short, you're absolutely right; nobody should ever rely on the core collections to be thread-safe, even if they happen to be thread-safe by accident in the C implementations right now. That won't be the case on all implementations, and may not even be the case on future versions of the C impls. The safe answer is to ensure you're watching your own back and properly synchronizing access to shared data structures. - Charlie
on 2008-12-24 08:05
"Jörg W Mittag" <JoergWMittag+Usenet@GoogleMail.Com> wrote in message news:a4cg2zmfjkh9$.dlg@jwmittag.my-fqdn.de... >> Are they implemented to be thread safe? > will see that they are not thread-safe. Ergo: the specification says > that the user is responsible for locking Arrays and Hashes. > > If, however, you look at the implementation of threads, you will see > that both MRI and YARV are actually incapable of running more than one > thread at a time -- even on a 1000-core machine MRI and YARV will only > ever use one core. So, since two threads can never access an Array at > the same time, there is no need for locking. Ergo: the specification > says that the user is *not* responsible for locking Arrays and Hashes. I don't think this is relevant. Concurrency isn't about how many processors you use. Multitasking systems existed long before SMP hardware existed. Concurrency is about doing tasks concurrently. If you have one method running and it may be preempted by another method then they are running concurrently. If the two methods share data then they may corrupt that data for each other. This is true regardless of how these concurrencies, or threads, are implemented. It doesn't matter if they're hardware supported system threads or if they're Ruby green threads... > So, in short: when in doubt, lock. This is the popular wisdom so this is what I will do. Better safe than not thread-safe!
on 2008-12-24 10:37
Just Another Victim wrote: > I don't think this is relevant. Concurrency isn't about how many > processors you use. Multitasking systems existed long before SMP > hardware > existed. Concurrency is about doing tasks concurrently. If you have > one > method running and it may be preempted by another method then they are > running concurrently. If the two methods share data then they may > corrupt > that data for each other. This is true regardless of how these > concurrencies, or threads, are implemented. It doesn't matter if > they're > hardware supported system threads or if they're Ruby green threads... Not exactly. Ruby green threads aren't fully pre-emptive; they will only pre-empt at the boundaries between execution steps, not within an execution step. Hence the single operations @array.pop and @array.push(...) are safe against preemption. You could consider each method call to be wrapped implicitly by Thread.critical { ... }
on 2008-12-24 15:53
On Dec 24, 2008, at 1:56 AM, Just Another Victim of the Ambient Morality wrote: > they're > hardware supported system threads or if they're Ruby green threads... I think the key here is the granularity of Ruby's atomicity. You're assuming that preemption can occur on the granularity of machine instructions. Were that the case, two simultaneous threads on a single core could, potentially, cause problems. I think what Matz was saying is that, because of the GIL, simultaneous threads will only preempt at a much higher granularity. So I have a question for Matz and Charles: Would it be reasonable to specify that YARV instructions should be atomic? Charles, how does this work with JVM ops? Last I heard, JRuby was still skipping YARV and going straight to Java bytecodes, which could make this a difficult proposition. My completely uneducated guess, though, is that unless we specify that certain implementation provided data structures must be thread safe (at the very least Mutex), then there would have to be a minimum level at which everything is atomic to be able to write implementation independent thread-safe libraries. - Josh
on 2008-12-25 00:44
Joshua Ballanco wrote: > I think the key here is the granularity of Ruby's atomicity. You're > assuming that preemption can occur on the granularity of machine > instructions. Were that the case, two simultaneous threads on a single > core could, potentially, cause problems. I think what Matz was saying is > that, because of the GIL, simultaneous threads will only preempt at a > much higher granularity. And rarely within C-based code, unless that code explicitly yields control to the thread scheduler. This also means that calls out to C libraries have to be written to use asynchronous calls or they just plain block all threads. In JRuby, threads may preempt at any time, and indeed can and will run "really" in parallel at any given time if the hardware supports it. > So I have a question for Matz and Charles: Would it be reasonable to > specify that YARV instructions should be atomic? Charles, how does this > work with JVM ops? Last I heard, JRuby was still skipping YARV and going > straight to Java bytecodes, which could make this a difficult > proposition. My completely uneducated guess, though, is that unless we > specify that certain implementation provided data structures must be > thread safe (at the very least Mutex), then there would have to be a > minimum level at which everything is atomic to be able to write > implementation independent thread-safe libraries. Everything in the thread(.rb) library obviously has thread-safety as part of its contract, so you don't have to worry about that. The core collections (Array, String, Hash) do not have such guarantees explicitly in their contract, and I believe they should stay that way since the vast majority of uses are single-threaded. We (JRuby) have additionally added locking (as smartly as possible) to ensure that method and instance variable tables are thread-safe, since they're crucial to Ruby's operation. I don't think YARV instructions are good atomic units. In JRuby we can't even (and won't even) guarantee individual Java bytecodes are atomic. Nothing can be atomic unless you lock around it, and in most cases you can't have atomicity and still allow code to execute in parallel. Plus YARV instructions cover a wide range of things, some of which are obviously not atomic like arbitrary method calls or test-and-set (||=, &&=) logic. Atomicity and thread-safety should be specified on a per-mutator basis for all the mutable structures in Ruby, rather than as a blanket assertion. - Charlie
on 2008-12-26 10:43
I would very much welcome a set of guaranteed-thread-safe wrapper > collections folks could use if they're concerned about concurrency, since it > would be unreasonable to penalize all code with locking. For now, learn and > love Mutex, or don't share collections across threads. That sounds like a fun job. The issue goes well with a question which burns on the top of my tongue: How to test thread issues? I do not have any experience in this field but consider it somehow critical before "promising" functional thread safe wrappers. Cheers R. -- Il computer non è una macchina intelligente che aiuta le persone stupide, anzi, è una macchina stupida che funziona solo nelle mani delle persone intelligenti. Computers are not smart to help stupid people, rather they are stupid and will work only if taken care of by smart people. Umberto Eco
on 2008-12-26 11:27
On 26.12.2008 10:43, Robert Dober wrote: > I would very much welcome a set of guaranteed-thread-safe wrapper >> collections folks could use if they're concerned about concurrency, since it >> would be unreasonable to penalize all code with locking. For now, learn and >> love Mutex, or don't share collections across threads. > > That sounds like a fun job. The issue goes well with a question which > burns on the top of my tongue: > How to test thread issues? I do not have any experience in this field > but consider it somehow critical before "promising" functional thread > safe wrappers. This is tricky. There are few things I'd like to say to this: first, there is an easy way to provide _basic_ thread safety by wrapping all method calls in a synchronized block - for this even a SynchronizedDelegator would be sufficient which could be used for _all_ classes - not just collections. In this case, because of the way the wrapping is done there is no need to test thread safety because - as long as the delegator ensures that all methods are wrapped in this way there is no chance of corrupting internal state. But, in the general case thread safety cannot be achieved on the class level. My typical example is this if hash.contains_key? k dat = hash[k] else dat = create_dat(k) hash[k] = dat end The basic property of this bit of code which makes it impossible to ensure thread safety on level of class Hash is that there are two methods invoked on the same instance and there must not be any state change between them because then you either end up with garbage (nil) in "dat" or you invoke create_dat(k) more than once per key value and this leads to different threads having a different idea of what hash[k] is. So in this case you need a lock around the _complete_ block of code. (Note, the method level lock would work if using a Hash with a default block, but this solves only a small portion of the cases.) This is also the reason why a general per method locking is only of limited use. It only ensures consistency of the internal state of an object but can never ensure overall concurrency correctness of an application. Testing thread safety is difficult to impossible for several reasons. One of the reasons is that for proper control of the test case you would need to ensure exact timing of thread execution. While you could do that I have never seen people actually doing this - maybe because this will make test suites run much slower. Another reason is that you vastly depend on the underlying machine (i.e. hardware, OS and VM). I have seen Java programs break as soon as they were executed on a multi core machine with more than n cores. Things aren't made easier by the fact that - concluding from postings I see on comp.lang.java.* newsgroups for example - few people seem to have a thorough understanding of concurrency and all the issues involved. Another item on the "makes it hard" list is the fact that most OO and procedural languages only have a very basic toolkit for concurrency control; while Java started out pretty good with built in "synchronized" it took until version 5 that they incorporated Doug Lea's concurrency utilities into the language's standard library and also into the EJB spec. It's different in other programming languages: functional languages are by definition thread safe because they are free of side effects. (At least in theory. :-)) Also, other languages built for concurrent applications which have a different programming model (e.g. Occam) of course have much better support for concurrency. Kind regards robert
on 2008-12-26 17:24
Just Another Victim of the Ambient Morality wrote: > I don't think this is relevant. Concurrency isn't about how many > processors you use. Multitasking systems existed long before SMP hardware > existed. Concurrency is about doing tasks concurrently. If you have one > method running and it may be preempted by another method then they are > running concurrently. If the two methods share data then they may corrupt > that data for each other. This is true regardless of how these > concurrencies, or threads, are implemented. It doesn't matter if they're > hardware supported system threads or if they're Ruby green threads... Keep in mind though that on smp systems, most instructions are not atomic. On a single processor, they are.
on 2008-12-26 19:17
On Fri, Dec 26, 2008 at 11:24 AM, Robert Klemme <shortcutter@googlemail.com> wrote: >> That sounds like a fun job. The issue goes well with a question which > thread safety because - as long as the delegator ensures that all methods > end > this solves only a small portion of the cases.) This is also the reason why > break as soon as they were executed on a multi core machine with more than n > > > -- > remember.guy do |as, often| as.you_can - without end > > Robert, IIUC we do not want either the one, nor the other. For what I am concerned one would need Read/Write Locks. The built in methods like [] and []= would obtain read and write locks, while the read lock automatically obtains a write lock and only releases it when its count is to zero, the write lock would inhibit the read lock to be obtained. I do not know if one should allow user access to these locks? My first thought would be no, all we want to do is to have thread-safe Hash, Array, String but not to provide advanced synchronization mechanisms. OTOH one could expose the write_lock and read_lock of each object which would allow for the following hash = TS_Hash::new ... dat = hash.read_synchronize do if hash.has_key? key then hash[ key ] else hash.write_synchronize do hash[key] = compute_data( key ) end end end N.B. We have to solve the interlock problem here as normally the write_synchronize does not obtain the write_lock as the read_synchronize did get a read_lock. We would need to grant the write lock if the read_lock is obtained by the current thread *only*, but imagine two threads waiting for the write_lock while containing the read_lock, the good old philosophers-forks-spoon-spaghetti interlock. [ That is why we eat spaghetti with a fork only ;) ] Therefore I guess the RW-locking in a ThreadSafe Container class shall rather not be exposed as we avoiding interlocking is not that complicated IIRC And if we expose read_synchronize &blk, and write_synchronize &blk we should probably raise something like an IllegalMonitorState exception if a thread tries to call the one inside the other. More thoughts please. Cheers Robert -- Il computer non è una macchina intelligente che aiuta le persone stupide, anzi, è una macchina stupida che funziona solo nelle mani delle persone intelligenti. Computers are not smart to help stupid people, rather they are stupid and will work only if taken care of by smart people. Umberto Eco
on 2008-12-27 10:05
* Robert Klemme <shortcutter@googlemail.com> (11:23) schrieb: > if hash.contains_key? k > dat = hash[k] > else > dat = create_dat(k) > hash[k] = dat > end > [...] > So in this case you need a lock around the _complete_ block of code. > (Note, the method level lock would work if using a Hash with a default > block, but this solves only a small portion of the cases.) If Hash had a method get_with_default that took a default value or a block like initialize you could solve these cases. mfg, simon .... l
on 2008-12-27 11:09
On Sat, Dec 27, 2008 at 10:04 AM, Simon Krahnke <overlord@gmx.li> wrote: > >> So in this case you need a lock around the _complete_ block of code. >> (Note, the method level lock would work if using a Hash with a default >> block, but this solves only a small portion of the cases.) > > If Hash had a method get_with_default that took a default value or a I believe you mean "get and assign with default" because IIRC there is a get_with_default, it is simply called Hash#fetch R.
on 2008-12-27 11:55
On 27.12.2008 09:55, Simon Krahnke wrote: > >> So in this case you need a lock around the _complete_ block of code. >> (Note, the method level lock would work if using a Hash with a default >> block, but this solves only a small portion of the cases.) > > If Hash had a method get_with_default that took a default value or a > block like initialize you could solve these cases. Please read my note: this was just an example for the class of synchronization problems that cannot be solved within the class. You can invent arbitrary more complex scenarios that would not make any sense to be covered by a standard method in class Hash. My point is totally independent of the functionality of class Hash. This is about sequences of method invocations which do not tolerate any state changes by other threads between method calls. Put it differently: in concurrent applications there are many scenarios where lock granularity "method" is insufficient to guarantee correct code. Instead you need explicit locking for larger portions of _client_ code. Kind regards robert
on 2008-12-27 13:00
On 26.12.2008 19:17, Robert Dober wrote: > On Fri, Dec 26, 2008 at 11:24 AM, Robert Klemme > <shortcutter@googlemail.com> wrote: <snip>full quote</snip> Please invest a bit more time in quoting and trimming. I may speak for me only but I find it unnecessary hard to follow a discussion when there is a full quote and no clear indication of references in your reply. > Robert, IIUC we do not want either the one, nor the other. What are you referring to with "one" and "other"? > For what I am concerned one would need Read/Write Locks. Why? > The built in methods like [] and []= would obtain read and write > locks, while the read lock automatically obtains a write lock and only > releases it when its count is to zero, the write lock would inhibit > the read lock to be obtained. Let's first talk about semantics not jump into implementation. How the read write locking is implemented is rather unimportant for the question what kind of additional locking we want to propose for default collections (if any). As far as I can see three variants are lying on the table with the lock free collection always be present as well: 1. none at all, locking needs to be explicitly done in client code 2. exclusive locking, no two methods can execute concurrently on the same instance 3. read write locking with the usual semantics, allowing multiple methods with read semantic to be executed concurrently or at most one "write" method at a time. "write" lock excludes any "read" locks. I'll follow up with discussion at the end. > hash[ key ] > else > hash.write_synchronize do > hash[key] = compute_data( key ) > end > end > end This is a bad idiom because it is prone to deadlocking. When having multiple levels of locks you must only go from stronger to weaker locks, not the other way round. Otherwise this can happen t1: read lock t2: read lock t1: write lock (deadlock, blocked by t2's read lock) See also http://java.sun.com/j2se/1.5.0/docs/api/java/util/... > N.B. We have to solve the interlock problem here as normally the > write_synchronize does not obtain the write_lock > as the read_synchronize did get a read_lock. We would need to grant > the write lock if the read_lock is obtained by the current thread > *only*, With this definition you force code to obtain a read lock before the write lock can be held. I assume you mean that the rule should rather be "grant the write lock only if no other thread holds the read lock" - which is what rw locking usually does. > but imagine two threads waiting for the write_lock while > containing the read_lock, the good old > philosophers-forks-spoon-spaghetti interlock. [ That is why we eat > spaghetti with a fork only ;) ] Exactly, as show above. > Therefore I guess the RW-locking in a ThreadSafe Container class shall > rather not be exposed as we avoiding interlocking is not that > complicated IIRC > > And if we expose read_synchronize &blk, and write_synchronize &blk we > should probably raise something like an IllegalMonitorState exception > if a thread tries to call the one inside the other. Not necessarily: read_synchronize inside write_synchronize is perfectly ok although it has no noticeable effect. But since it can occur in another method it is reasonable to allow it. Some more remarks about the complexity of read write locking can be found here http://java.sun.com/j2se/1.5.0/docs/api/java/util/... For the interested Wikipedia also has a number of pages on the matter "locking": http://en.wikipedia.org/wiki/Lock_(computer_science) Now, what would be reasonable to provide in a thread safe standard collection? IMHO the discussion above and the complexity and wide variance in implementation choices for read write locking makes it next to impossible to come up with an optimal read write lock implementation in standard collections for most application scenarios. This yields a bad cost benefit ratio for option 3 above. This leaves us with options 1 and 2. I lean toward option 1 because even the dead simple exclusive lock approach is of limited use only. Even in light of default values or the default_proc of a Hash there are some things to note. Consider # assuming Synchronize() wraps an instance with a delegator # which synchronizes all method calls on a single lock $all_values = Synchronize(Hash.new {|h,k| h[k] = Synchronize([])}) ... # in multiple threads: $all_values[some_key] << item While our basic locking ensures internal state of all collections is consistent the code has the disadvantage that every thread needs to obtain two locks and the number of locks is not limited (1 + no of keys). A single lock would be sufficient here. Another disadvantage of built in locking is that it might trick some people in believing that this is all they need to use to get thread safe code. If there would be no such thing as a default thread safe Array and Hash people are forced to think about how they want to go about locking and I believe that way they will get better (i.e. more correct and more efficient) code. As a compromise I suggest to provide something like the Synchronize() method I showed above which does two things: 1. it extends the object passed with MonitorMixin and 2. it wraps the instance with another instance which synchronizes all methods like this: class SynchWrapper def initialize(o) @obj = o.extend(::MonitorMixin) end # remove all Object methods def method_missing(*a,&b) @obj.synchronize { @obj.__send__(*a,&b) } end end Advantage is that there is a single mechanism that uniformly works for _all_ classes not just collections. And, the mechanism is made explicit while not requiring too much manual intervention or additional typing. For all cases where the logic requires other locking schemes explicit locking needs to be employed anyway. Kind regards robert PS: I just recognize we should probably move this discussion over to ruby-core...
on 2008-12-27 15:23
On Sat, Dec 27, 2008 at 12:59 PM, Robert Klemme <shortcutter@googlemail.com> wrote: > On 26.12.2008 19:17, Robert Dober wrote: >> >> On Fri, Dec 26, 2008 at 11:24 AM, Robert Klemme >> <shortcutter@googlemail.com> wrote: > > <snip>full quote</snip> > > Please invest a bit more time in quoting and trimming. I may speak for me > only but I find it unnecessary hard to follow a discussion when there is a > full quote and no clear indication of references in your reply. Oh this was bad work, I messed up, my apologies. > >> Robert, IIUC we do not want either the one, nor the other. > > What are you referring to with "one" and "other"? IIRC, because I indeed lost context, sorry again, I meant that your observations, however useful and learned, did not have a direct impact to what Charles wants here. Look e.g. at the synchronized containers in Java. They are thread safe and yet all your concerns are still valid. In other words you are already talking about problems of the next level of abstraction while the level below still has problems. So all I wanted to say is that we still should tackle the low level thread-safety. > >> For what I am concerned one would need Read/Write Locks. > > Why? Because we want simultaneous read access. And that was the other point I referred to in my reply, we do not want to do exclusive reads. > lock free collection always be present as well: > > > 3. read write locking with the usual semantics, allowing multiple methods > with read semantic to be executed concurrently or at most one "write" method > at a time. "write" lock excludes any "read" locks. 3 by all means >> if hash.has_key? key then >> hash[ key ] >> else >> hash.write_synchronize do >> hash[key] = compute_data( key ) >> end >> end >> end > > This is a bad idiom because it is prone to deadlocking. That was my point to show why I am against exposure of the internal RW lock. I guess I did not make that clear either :( > When having > multiple levels of locks you must only go from stronger to weaker locks, not > the other way round. Otherwise this can happen > > t1: read lock > t2: read lock > t1: write lock (deadlock, blocked by t2's read lock) I disagree, write lock upgrade should throw an exception. > > See also > http://java.sun.com/j2se/1.5.0/docs/api/java/util/... I am quite fond of Doug Lea's Read Write Lock from Java Threads but I will definitely have a look. > rw locking usually does. Here I messed up again, you got it right, but this was not a stupid error, I just got it wrong :( > >> but imagine two threads waiting for the write_lock while >> containing the read_lock, the good old >> philosophers-forks-spoon-spaghetti interlock. [ That is why we eat >> spaghetti with a fork only ;) ] > > Exactly, as show above. > Yup > method it is reasonable to allow it. Hmm we have not yet defined the semantics of this, you were right when complaining about my jump to implementation, right now I do not see the need for this but I will try to make my home work. > > Now, what would be reasonable to provide in a thread safe standard > collection? IMHO the discussion above and the complexity and wide variance > in implementation choices for read write locking makes it next to impossible > to come up with an optimal read write lock implementation in standard > collections for most application scenarios. I agree but we should opt for a down the middle road. > > This yields a bad cost benefit ratio for option 3 above. This leaves us > with options 1 and 2. I lean toward option 1 because even the dead simple > exclusive lock approach is of limited use only. Even in light of default > values or the default_proc of a Hash there are some things to note. I fail to see why 3 is so costly, I mean why optimize. Thread programming is complex (that's why we all love immutable objects, right ;) and I fail to see why one could hope that it becomes less complex just because we add a small but still useful feature. > While our basic locking ensures internal state of all collections is > consistent the code has the disadvantage that every thread needs to obtain > two locks and the number of locks is not limited (1 + no of keys). A single > lock would be sufficient here. > > Another disadvantage of built in locking is that it might trick some people > in believing that this is all they need to use to get thread safe code. If > there would be no such thing as a default thread safe Array and Hash people > are forced to think about how they want to go about locking and I believe > that way they will get better (i.e. more correct and more efficient) code. No please that really could be said against any progress. Furthermore I believe that folks asking for thread safe collections know mostly why they do that, after all thread safety is not a hype. > > where the logic requires other locking schemes explicit locking needs to be > employed anyway. However there is too big a prise to pay, ok we could get rid of the metaprogramming and just follow the idea you have presented, but exclusive read access just will make the wrapped collection unusable for any heavy weight parallel read access. > > Kind regards > > robert > > > PS: I just recognize we should probably move this discussion over to > ruby-core... Hmm maybe Anyway thanks for your time. Robert -- Il computer non è una macchina intelligente che aiuta le persone stupide, anzi, è una macchina stupida che funziona solo nelle mani delle persone intelligenti. Computers are not smart to help stupid people, rather they are stupid and will work only if taken care of by smart people. Umberto Eco
on 2008-12-28 04:05
* Robert Dober <robert.dober@gmail.com> (11:08) schrieb: >>> [...] >> >>> So in this case you need a lock around the _complete_ block of code. >>> (Note, the method level lock would work if using a Hash with a default >>> block, but this solves only a small portion of the cases.) >> >> If Hash had a method get_with_default that took a default value or a > I believe you mean > "get and assign with default" > because IIRC there is a get_with_default, it is simply called > Hash#fetch You are right. There is a little difference: fetch doesn't pass the hash to its block. But one can "abuse" both with an assignment in the block: | dat = hash.fetch(k) { | key | hash[key] = create_dat(key) } mfg, simon .... l
on 2008-12-28 04:06
* Robert Klemme <shortcutter@googlemail.com> (11:52) schrieb: >>> [...] > can invent arbitrary more complex scenarios that would not make any > sense to be covered by a standard method in class Hash. Well, my point is it's a bad example. You are asking an object questions to base your actions on the object on. Don't ask, tell. :-) > Put it differently: in concurrent applications there are many scenarios > where lock granularity "method" is insufficient to guarantee correct > code. Instead you need explicit locking for larger portions of _client_ > code. That might be, I'd like to see a good example. mfg, simon .... i'm not killfiled?
on 2009-01-02 09:21
Robert Dober wrote: >> I would very much welcome a set of guaranteed-thread-safe wrapper >> collections folks could use if they're concerned about concurrency, since it >> would be unreasonable to penalize all code with locking. For now, learn and >> love Mutex, or don't share collections across threads. > That sounds like a fun job. The issue goes well with a question which > burns on the top of my tongue: > How to test thread issues? I do not have any experience in this field > but consider it somehow critical before "promising" functional thread > safe wrappers. You might want to check out "CHESS: A Systematic Testing Tool for Concurrent Software" (<http://Research.Microsoft.Com/CHESS/>) by Microsoft Research. It is a testing framework for multithreaded programs (both .NET and native code). If I understood that correctly, it replaces the operating system's threading libraries with its own, so that it can control thread switching. Then it analyzes the program to figure out every possible way that the execution streams of the threads can interleave and it re-runs the program with every possible interleaving. It actually should Just Work(TM) if you use either Ruby.NET or IronRuby to compile your Ruby program to CIL bytecode. (Where "Ruby program" obviously also can be your test suite.) Obviously, it would be even better to have a native Ruby version of such a tool. Rubinus would probably be an awesome base for such a project. The problem is in the "determine all possible interleavings" part: I don't know whether CHESS uses static or dynamic analysis (or both) for that; doing static analysis on arbitrary Ruby code might be hard, probably too hard (in the "equivalent to the Halting Problem" sense). Static analysis on a *restricted subset* of Ruby, however, possibly aided with additional annotations might well be feasible -- and you wouldn't want to use crazy dynamic metaprogramming in a core data structure anyway. Of course, you could also try dynamic analysis. jwm
on 2009-01-04 20:15
On Fri, Jan 2, 2009 at 9:11 AM, Jörg W Mittag <JoergWMittag+Usenet@googlemail.com> wrote: I'll have a look thanx It is change, continuing change, inevitable change, that is the dominant factor in society today. No sensible decision can be made any longer without taking into account not only the world as it is, but the world as it will be ... ~ Isaac Asimov
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.