Forum: Ruby-core [ruby-trunk - Bug #7429][Open] Provide options for core collections to customize behavior

Posted by Charles Nutter (headius)
on 2012-11-24 17:50
(Received via mailing list)
Issue #7429 has been reported by headius (Charles Nutter).

----------------------------------------
Bug #7429: Provide options for core collections to customize behavior
https://bugs.ruby-lang.org/issues/7429

Author: headius (Charles Nutter)
Status: Open
Priority: Normal
Assignee:
Category:
Target version: 2.0.0
ruby -v: 2.0.0


Many folks know that Matz is a fan of having a few classes that handle a 
wide range of behavior. For this reason, I think we're unlikely to get a 
set of parallelism-safe collections added to Ruby. I propose an 
alternative that would allow parallel-executing implementations to offer 
concurrency-friendly versions without impact to any code.

I would like to see a way to pass in an options hash to the .new 
construction of at least Hash and Array. For example:

# Create a hash that is concurrenct-safe, resizes when density is > 3 
keys per bucket,
# and sets initial bucket count to 60
hsh = Hash.new(concurrent: true, density: 3, size: 60)

Similar for Array:

ary = Array.new(concurrent: true, size: 100)

Options like density and size map directly to current implementation 
details of Hash and Array that are sometimes not accessible (you can't 
change the load factor for Hash, for example). Options like concurrent 
could be noops on MRI but would allow other implementations to provide 
safe versions of the collections behind the scenes.

I know it may be too late for 2.0.0, but we implementers of 
parallel-executing Rubies feel the pain of thread-unsafe core structures 
every day. I think the ability to get thread-safe collections in Ruby 
core would go a long way toward dispelling the myth that Ruby is bad for 
concurrent application programming.
Posted by matz (Yukihiro Matsumoto) (Guest)
on 2012-11-25 13:54
(Received via mailing list)
Issue #7429 has been updated by matz (Yukihiro Matsumoto).

Status changed from Open to Rejected

Even though I prefer smaller set of built-in fundamental classes, I 
don't think 'concurrent' option is sufficient,
since it requires totally different implementation.  I think it's rather 
better for you to propose 'ParallelHash' etc.

Matz.

----------------------------------------
Feature #7429: Provide options for core collections to customize 
behavior
https://bugs.ruby-lang.org/issues/7429#change-33857

Author: headius (Charles Nutter)
Status: Rejected
Priority: Normal
Assignee:
Category:
Target version: 2.0.0


Many folks know that Matz is a fan of having a few classes that handle a 
wide range of behavior. For this reason, I think we're unlikely to get a 
set of parallelism-safe collections added to Ruby. I propose an 
alternative that would allow parallel-executing implementations to offer 
concurrency-friendly versions without impact to any code.

I would like to see a way to pass in an options hash to the .new 
construction of at least Hash and Array. For example:

# Create a hash that is concurrenct-safe, resizes when density is > 3 
keys per bucket,
# and sets initial bucket count to 60
hsh = Hash.new(concurrent: true, density: 3, size: 60)

Similar for Array:

ary = Array.new(concurrent: true, size: 100)

Options like density and size map directly to current implementation 
details of Hash and Array that are sometimes not accessible (you can't 
change the load factor for Hash, for example). Options like concurrent 
could be noops on MRI but would allow other implementations to provide 
safe versions of the collections behind the scenes.

I know it may be too late for 2.0.0, but we implementers of 
parallel-executing Rubies feel the pain of thread-unsafe core structures 
every day. I think the ability to get thread-safe collections in Ruby 
core would go a long way toward dispelling the myth that Ruby is bad for 
concurrent application programming.
Posted by Thomas Sawyer (7rans)
on 2012-11-25 15:20
(Received via mailing list)
Issue #7429 has been updated by trans (Thomas Sawyer).


=begin
I wonder if concurrency behavior can be designed as a mixin. As long as 
the underlying class conforms to its interface then "ParallelWhatever" 
becomes possible. ParallelHash would then not be needed as a built-in 
class b/c it would be simple to define.

  class MyParallelHash < Hash
    include Concurrency
  end

=end

----------------------------------------
Feature #7429: Provide options for core collections to customize 
behavior
https://bugs.ruby-lang.org/issues/7429#change-33860

Author: headius (Charles Nutter)
Status: Rejected
Priority: Normal
Assignee:
Category:
Target version: 2.0.0


Many folks know that Matz is a fan of having a few classes that handle a 
wide range of behavior. For this reason, I think we're unlikely to get a 
set of parallelism-safe collections added to Ruby. I propose an 
alternative that would allow parallel-executing implementations to offer 
concurrency-friendly versions without impact to any code.

I would like to see a way to pass in an options hash to the .new 
construction of at least Hash and Array. For example:

# Create a hash that is concurrenct-safe, resizes when density is > 3 
keys per bucket,
# and sets initial bucket count to 60
hsh = Hash.new(concurrent: true, density: 3, size: 60)

Similar for Array:

ary = Array.new(concurrent: true, size: 100)

Options like density and size map directly to current implementation 
details of Hash and Array that are sometimes not accessible (you can't 
change the load factor for Hash, for example). Options like concurrent 
could be noops on MRI but would allow other implementations to provide 
safe versions of the collections behind the scenes.

I know it may be too late for 2.0.0, but we implementers of 
parallel-executing Rubies feel the pain of thread-unsafe core structures 
every day. I think the ability to get thread-safe collections in Ruby 
core would go a long way toward dispelling the myth that Ruby is bad for 
concurrent application programming.
Posted by Charles Nutter (headius)
on 2012-11-25 19:23
(Received via mailing list)
Issue #7429 has been updated by headius (Charles Nutter).


matz (Yukihiro Matsumoto) wrote:
> Even though I prefer smaller set of built-in fundamental classes, I don't think 
'concurrent' option is sufficient,
> since it requires totally different implementation.  I think it's rather better 
for you to propose 'ParallelHash' etc.

That was indeed my intention; concurrent: true would allow using a 
completely different implementation under the covers, allowing for an 
efficient concurrent hash/array (rather than one that simply locks on 
all accesses).

I will repropose as new collection types. Is there any chance of getting 
them into 2.0.0?
----------------------------------------
Feature #7429: Provide options for core collections to customize 
behavior
https://bugs.ruby-lang.org/issues/7429#change-33865

Author: headius (Charles Nutter)
Status: Rejected
Priority: Normal
Assignee:
Category:
Target version: 2.0.0


Many folks know that Matz is a fan of having a few classes that handle a 
wide range of behavior. For this reason, I think we're unlikely to get a 
set of parallelism-safe collections added to Ruby. I propose an 
alternative that would allow parallel-executing implementations to offer 
concurrency-friendly versions without impact to any code.

I would like to see a way to pass in an options hash to the .new 
construction of at least Hash and Array. For example:

# Create a hash that is concurrenct-safe, resizes when density is > 3 
keys per bucket,
# and sets initial bucket count to 60
hsh = Hash.new(concurrent: true, density: 3, size: 60)

Similar for Array:

ary = Array.new(concurrent: true, size: 100)

Options like density and size map directly to current implementation 
details of Hash and Array that are sometimes not accessible (you can't 
change the load factor for Hash, for example). Options like concurrent 
could be noops on MRI but would allow other implementations to provide 
safe versions of the collections behind the scenes.

I know it may be too late for 2.0.0, but we implementers of 
parallel-executing Rubies feel the pain of thread-unsafe core structures 
every day. I think the ability to get thread-safe collections in Ruby 
core would go a long way toward dispelling the myth that Ruby is bad for 
concurrent application programming.
Posted by Charles Nutter (headius)
on 2012-11-25 19:26
(Received via mailing list)
Issue #7429 has been updated by headius (Charles Nutter).


trans (Thomas Sawyer) wrote:
> I wonder if concurrency behavior can be designed as a mixin. As long as the 
underlying class conforms to its interface then "ParallelWhatever" becomes 
possible. ParallelHash would then not be needed as a built-in class b/c it would 
be simple to define.
>
>   class MyParallelHash < Hash
>     include Concurrency
>   end

Actually, JRuby already provides this as JRuby::Synchronized:

class MyParallelHash < Hash
  include JRuby::Synchronized
end

It's rather hacky though for a few reasons:

* In JRuby, the JRuby::Synchronized module inserts itself into the 
method lookup process, returning all methods wrapped with 
synchronization against the target object.
* The synchronization is very coarse-grained and much slower than other 
implementations of a concurrent data structure that would use fewer (or 
no) locks.
* It is not common to have the inclusion of a module change the nature 
of every method in the host class.

It would be better if we had some fast, always-concurrent data 
structures built in.
----------------------------------------
Feature #7429: Provide options for core collections to customize 
behavior
https://bugs.ruby-lang.org/issues/7429#change-33866

Author: headius (Charles Nutter)
Status: Rejected
Priority: Normal
Assignee:
Category:
Target version: 2.0.0


Many folks know that Matz is a fan of having a few classes that handle a 
wide range of behavior. For this reason, I think we're unlikely to get a 
set of parallelism-safe collections added to Ruby. I propose an 
alternative that would allow parallel-executing implementations to offer 
concurrency-friendly versions without impact to any code.

I would like to see a way to pass in an options hash to the .new 
construction of at least Hash and Array. For example:

# Create a hash that is concurrenct-safe, resizes when density is > 3 
keys per bucket,
# and sets initial bucket count to 60
hsh = Hash.new(concurrent: true, density: 3, size: 60)

Similar for Array:

ary = Array.new(concurrent: true, size: 100)

Options like density and size map directly to current implementation 
details of Hash and Array that are sometimes not accessible (you can't 
change the load factor for Hash, for example). Options like concurrent 
could be noops on MRI but would allow other implementations to provide 
safe versions of the collections behind the scenes.

I know it may be too late for 2.0.0, but we implementers of 
parallel-executing Rubies feel the pain of thread-unsafe core structures 
every day. I think the ability to get thread-safe collections in Ruby 
core would go a long way toward dispelling the myth that Ruby is bad for 
concurrent application programming.
Posted by Thomas Sawyer (7rans)
on 2012-11-25 19:45
(Received via mailing list)
Issue #7429 has been updated by trans (Thomas Sawyer).


That is interesting. I suspect part of the problem is the design of the 
Hash class itself --something I addressed before in #6442. If Hash had a 
core set of non-reducible methods which all the others depended, then 
those should be the only methods that would need to be targeted. Also, 
#prepend might be handy here, rather than "inserts itself into the 
method lookup process".

Speed isn't everything. I think good design is at least, if not more, 
important.
----------------------------------------
Feature #7429: Provide options for core collections to customize 
behavior
https://bugs.ruby-lang.org/issues/7429#change-33867

Author: headius (Charles Nutter)
Status: Rejected
Priority: Normal
Assignee:
Category:
Target version: 2.0.0


Many folks know that Matz is a fan of having a few classes that handle a 
wide range of behavior. For this reason, I think we're unlikely to get a 
set of parallelism-safe collections added to Ruby. I propose an 
alternative that would allow parallel-executing implementations to offer 
concurrency-friendly versions without impact to any code.

I would like to see a way to pass in an options hash to the .new 
construction of at least Hash and Array. For example:

# Create a hash that is concurrenct-safe, resizes when density is > 3 
keys per bucket,
# and sets initial bucket count to 60
hsh = Hash.new(concurrent: true, density: 3, size: 60)

Similar for Array:

ary = Array.new(concurrent: true, size: 100)

Options like density and size map directly to current implementation 
details of Hash and Array that are sometimes not accessible (you can't 
change the load factor for Hash, for example). Options like concurrent 
could be noops on MRI but would allow other implementations to provide 
safe versions of the collections behind the scenes.

I know it may be too late for 2.0.0, but we implementers of 
parallel-executing Rubies feel the pain of thread-unsafe core structures 
every day. I think the ability to get thread-safe collections in Ruby 
core would go a long way toward dispelling the myth that Ruby is bad for 
concurrent application programming.
Posted by Charles Nutter (headius)
on 2012-11-25 19:57
(Received via mailing list)
Issue #7429 has been updated by headius (Charles Nutter).


Speed isn't everything...until it becomes everything. Ruby should not be 
wasteful unnecessarily. There are also immutable truths about data 
structure implementation, like the fact that adding thread-safety to 
most data structures comes at a cost...which is why in JRuby we decided 
not to make Hash and Array thread-safe by default.

There are also a limited set of data structures that make unsynchronized 
reads concurrent with writes safe. You usually have to have a data 
structure that's designed to do both safely or they both have to be 
synchronized.

For Hash at least, I think something like a CTrie (concurrent hash array 
mapped trie) would be a nice choice. It's not O(1) but it's pretty good.
----------------------------------------
Feature #7429: Provide options for core collections to customize 
behavior
https://bugs.ruby-lang.org/issues/7429#change-33868

Author: headius (Charles Nutter)
Status: Rejected
Priority: Normal
Assignee:
Category:
Target version: 2.0.0


Many folks know that Matz is a fan of having a few classes that handle a 
wide range of behavior. For this reason, I think we're unlikely to get a 
set of parallelism-safe collections added to Ruby. I propose an 
alternative that would allow parallel-executing implementations to offer 
concurrency-friendly versions without impact to any code.

I would like to see a way to pass in an options hash to the .new 
construction of at least Hash and Array. For example:

# Create a hash that is concurrenct-safe, resizes when density is > 3 
keys per bucket,
# and sets initial bucket count to 60
hsh = Hash.new(concurrent: true, density: 3, size: 60)

Similar for Array:

ary = Array.new(concurrent: true, size: 100)

Options like density and size map directly to current implementation 
details of Hash and Array that are sometimes not accessible (you can't 
change the load factor for Hash, for example). Options like concurrent 
could be noops on MRI but would allow other implementations to provide 
safe versions of the collections behind the scenes.

I know it may be too late for 2.0.0, but we implementers of 
parallel-executing Rubies feel the pain of thread-unsafe core structures 
every day. I think the ability to get thread-safe collections in Ruby 
core would go a long way toward dispelling the myth that Ruby is bad for 
concurrent application programming.
Posted by matz (Yukihiro Matsumoto) (Guest)
on 2012-11-25 23:41
(Received via mailing list)
Issue #7429 has been updated by matz (Yukihiro Matsumoto).


@headius I am afraid it's not going to 2.0, for some reasons:

* it's too late to add new classes to 2.0
* and it will take more time to prepare parallel collection 
implementation for other Ruby (in this case mainly CRuby).
* the future CRuby will lean toward non threading way, e.g, Actors.

Matz.


----------------------------------------
Feature #7429: Provide options for core collections to customize 
behavior
https://bugs.ruby-lang.org/issues/7429#change-33879

Author: headius (Charles Nutter)
Status: Rejected
Priority: Normal
Assignee:
Category:
Target version: 2.0.0


Many folks know that Matz is a fan of having a few classes that handle a 
wide range of behavior. For this reason, I think we're unlikely to get a 
set of parallelism-safe collections added to Ruby. I propose an 
alternative that would allow parallel-executing implementations to offer 
concurrency-friendly versions without impact to any code.

I would like to see a way to pass in an options hash to the .new 
construction of at least Hash and Array. For example:

# Create a hash that is concurrenct-safe, resizes when density is > 3 
keys per bucket,
# and sets initial bucket count to 60
hsh = Hash.new(concurrent: true, density: 3, size: 60)

Similar for Array:

ary = Array.new(concurrent: true, size: 100)

Options like density and size map directly to current implementation 
details of Hash and Array that are sometimes not accessible (you can't 
change the load factor for Hash, for example). Options like concurrent 
could be noops on MRI but would allow other implementations to provide 
safe versions of the collections behind the scenes.

I know it may be too late for 2.0.0, but we implementers of 
parallel-executing Rubies feel the pain of thread-unsafe core structures 
every day. I think the ability to get thread-safe collections in Ruby 
core would go a long way toward dispelling the myth that Ruby is bad for 
concurrent application programming.
Posted by Charles Nutter (headius)
on 2012-11-26 16:28
(Received via mailing list)
Issue #7429 has been updated by headius (Charles Nutter).


matz (Yukihiro Matsumoto) wrote:
> @headius I am afraid it's not going to 2.0, for some reasons:
>
> * it's too late to add new classes to 2.0
> * and it will take more time to prepare parallel collection implementation for 
other Ruby (in this case mainly CRuby).

In CRuby, with the GIL, you can pretend the existing ones are 
parallel-safe, can't you?

> * the future CRuby will lean toward non threading way, e.g, Actors.

If those actors are going to run in parallel, you'll still need to have 
parallel-safe collections to pass between them...or an explicit 
mechanism for preventing users from passing mutable state between 
actors.
----------------------------------------
Feature #7429: Provide options for core collections to customize 
behavior
https://bugs.ruby-lang.org/issues/7429#change-33962

Author: headius (Charles Nutter)
Status: Rejected
Priority: Normal
Assignee:
Category:
Target version: 2.0.0


Many folks know that Matz is a fan of having a few classes that handle a 
wide range of behavior. For this reason, I think we're unlikely to get a 
set of parallelism-safe collections added to Ruby. I propose an 
alternative that would allow parallel-executing implementations to offer 
concurrency-friendly versions without impact to any code.

I would like to see a way to pass in an options hash to the .new 
construction of at least Hash and Array. For example:

# Create a hash that is concurrenct-safe, resizes when density is > 3 
keys per bucket,
# and sets initial bucket count to 60
hsh = Hash.new(concurrent: true, density: 3, size: 60)

Similar for Array:

ary = Array.new(concurrent: true, size: 100)

Options like density and size map directly to current implementation 
details of Hash and Array that are sometimes not accessible (you can't 
change the load factor for Hash, for example). Options like concurrent 
could be noops on MRI but would allow other implementations to provide 
safe versions of the collections behind the scenes.

I know it may be too late for 2.0.0, but we implementers of 
parallel-executing Rubies feel the pain of thread-unsafe core structures 
every day. I think the ability to get thread-safe collections in Ruby 
core would go a long way toward dispelling the myth that Ruby is bad for 
concurrent application programming.
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.