Threadify-0.0.1

this one’s for you charlie :wink:

NAME
threadify.rb

SYNOPSIS
enumerable = %w( a b c d )
enumerable.threadify(2){ ‘process this block using two worker
threads’ }

DESCRIPTION
threadify.rb makes it stupid easy to process a bunch of data using
‘n’
worker threads

INSTALL
gem install threadify

URI
http://rubyforge.org/projects/codeforpeople

SAMPLES

<========< sample/a.rb >========>

~ > cat sample/a.rb

 require 'open-uri'
 require 'yaml'

 require 'rubygems'
 require 'threadify'


 uris =
   %w(
     http://google.com
     http://yahoo.com
     http://rubyforge.org
     http://ruby-lang.org
     http://kcrw.org
     http://drawohara.com
     http://codeforpeople.com
   )


 time 'without threadify' do
   uris.each do |uri|
     body = open(uri){|pipe| pipe.read}
   end
 end


 time 'with threadify' do
   uris.threadify do |uri|
     body = open(uri){|pipe| pipe.read}
   end
 end


 BEGIN {
   def time label
     a = Time.now.to_f
     yield
   ensure
     b = Time.now.to_f
     y label => (b - a)
   end
 }

~ > ruby sample/a.rb

 ---
 without threadify: 7.41900205612183
 ---
 with threadify: 3.69886112213135

a @ http://codeforpeople.com/

Thank you! This gem pretty much makes my life simpler, and will
continue to make it simpler!

(stdlib please?)

~ ari

On Tue, Jul 1, 2008 at 1:04 PM, ara howard [email protected]
wrote:

URI
http://rubyforge.org/projects/codeforpeople

I only see a tgz link which redirects me to

http://rubyforge.rubyuser.de/codeforpeople/threadify-0.0.1.tgz

which in turn 404s

martin

ara howard wrote:

this one’s for you charlie :wink:

Appears to work just dandy under JRuby:

âž” time jruby --server -rthreadify -e “nums = *(1…35); def fib(n); if n
< 2; return n; else; return fib(n - 1) + fib(n - 2); end; end; nums.each
{|i| p fib(i)}”

real 0m11.889s
user 0m11.733s
sys 0m0.188s
~/NetBeansProjects/jruby âž” time jruby --server -rthreadify -e “nums =
*(1…35); def fib(n); if n < 2; return n; else; return fib(n - 1) +
fib(n - 2); end; end; nums.threadify {|i| p fib(i)}”

real 0m8.213s
user 0m12.722s
sys 0m0.178s

(One thread on my system consumes roughly 65-70% CPU, which explains why
full CPU on both cores doesn’t double performance here)

I also found some weird bug where Thread#kill/exit from within the
thread interacts weirdly with join happening outside, and never
terminates. Fixing that now.

  • Cahrlie

On Tue, Jul 1, 2008 at 2:02 PM, Charles Oliver N.
[email protected] wrote:

mirror delay. check codeforpeople svn, it’s only one file.

thanks, gotit. will also install the gem when it propagates, just to
keep my system informed :slight_smile:

m.

On Jul 1, 2008, at 3:11 PM, Martin DeMello wrote:

thanks, gotit. will also install the gem when it propagates, just to
keep my system informed :slight_smile:

0.0.2 and gem should be up

a @ http://codeforpeople.com/

Martin DeMello wrote:

On Tue, Jul 1, 2008 at 1:04 PM, ara howard [email protected] wrote:

URI
http://rubyforge.org/projects/codeforpeople

I only see a tgz link which redirects me to

http://rubyforge.rubyuser.de/codeforpeople/threadify-0.0.1.tgz

which in turn 404s

mirror delay. check codeforpeople svn, it’s only one file.

  • Charlie

On Jul 1, 2008, at 3:04 PM, Charles Oliver N. wrote:

Appears to work just dandy under JRuby:

âž” time jruby --server -rthreadify -e “nums = *(1…35); def fib(n);
if n < 2; return n; else; return fib(n - 1) + fib(n - 2); end; end;
nums.each {|i| p fib(i)}”

wow that’s cool - now that’s a a seriously easy way to parallelize :wink:

I also found some weird bug where Thread#kill/exit from within the
thread interacts weirdly with join happening outside, and never
terminates. Fixing that now.

glad to have helped :wink:

i just pushed out 0.0.2 and it just lets the thread die rather that
self-destructing. see how that works…

cheers.

a @ http://codeforpeople.com/

ara.t.howard wrote:

i just pushed out 0.0.2 and it just lets the thread die rather that
self-destructing. see how that works…

I fixed in JRuby just now (Thread#kill does an implicit join in JRuby to
make sure the thread dies…but if target == caller it was still trying
to join itself in a weird way) but basically breaking out of the loop
instead of Thread#exit solved it. Your 0.0.2 change is probably
equivalent.

  • Charlie

On Jul 1, 2008, at 3:04 PM, Charles Oliver N. wrote:

Appears to work just dandy under JRuby:

âž” time jruby --server -rthreadify -e “nums = *(1…35); def fib(n);
if n < 2; return n; else; return fib(n - 1) + fib(n - 2); end; end;
nums.each {|i| p fib(i)}”

wow that’s cool - now that’s a a seriously easy way to parallelize :wink:

I also found some weird bug where Thread#kill/exit from within the
thread interacts weirdly with join happening outside, and never
terminates. Fixing that now.

glad to have helped :wink:

i just pushed out 0.0.2 and it just lets the thread die rather that
self-destructing. see how that works…

cheers.

a @ http://codeforpeople.com/

On Tue, Jul 1, 2008 at 5:04 PM, Charles Oliver N.
[email protected] wrote:

ara howard wrote:

this one’s for you charlie :wink:

Appears to work just dandy under JRuby:

I was doing some comparison between threadify and peach with JRuby,
when I noticed some interesting behavior with using
Enumerator#to_enum.

Sample code and three different results are posted here:
http://pastie.org/230287. Each result randomly occurs and sometimes
the code produces no error whatsoever. MRI does not seem to exhibit
the same behavior.

I am not sure that what I am doing in the code is even reasonable,
however, I thought it might be worth pointing out.

threadify-0.0.2
jruby 1.1.3-dev (ruby 1.8.6 patchlevel 114) (2008-07-08 rev 7130)
[i386-java]
java version “1.5.0_13”
Java™ 2 Runtime Environment, Standard Edition (build
1.5.0_13-b05-237)
Java HotSpot™ Client VM (build 1.5.0_13-119, mixed mode, sharing)

OS X 10.5.4

Thanks,
Michael G.

On Jul 8, 2008, at 5:43 PM, Michael G. wrote:

I was doing some comparison between threadify and peach with JRuby,

Thanks,
Michael G.

bunch of ‘java.lang’ stuff in there - i’m out! :wink:

a @ http://codeforpeople.com/

Michael G. wrote:

Enumerator#to_enum.

Sample code and three different results are posted here:
http://pastie.org/230287. Each result randomly occurs and sometimes
the code produces no error whatsoever. MRI does not seem to exhibit
the same behavior.

I am not sure that what I am doing in the code is even reasonable,
however, I thought it might be worth pointing out.

Thanks for filing the bug. I’m looking into it now.

In general we have inserted synchronization code only where it really
appears to be necessary to maintain the integrity of data structures.
That means that in some cases, you need to be mindful of code actually
running in parallel against e.g. arrays, hashes, strings, and so on. But
we do want to reduce the possibility of a Java exception, so I’ll
investigate a bit.

  • Charlie

Charles Oliver N. wrote:

In general we have inserted synchronization code only where it really
appears to be necessary to maintain the integrity of data structures.
That means that in some cases, you need to be mindful of code actually
running in parallel against e.g. arrays, hashes, strings, and so on. But
we do want to reduce the possibility of a Java exception, so I’ll
investigate a bit.

I did find a few threading bugs in JRuby, and I’m working on them now.
Most of them seem specific to Enumerator…

  • Charlie

On Jul 10, 2008, at 8:39 PM, Daniel B. wrote:

start = Time.now

I think I’ll add a “threads” option directly, and borrow some of your
code. :slight_smile:

Thanks,

Dan

sweet. i wouldn’t launch rockets with it - but it a cheap speedup for
a bunch of ruby code. btw - check out my find method

http://codeforpeople.com/lib/ruby/alib/alib-0.5.1/lib/alib-0.5.1/find2.rb

very stolen and hacked

a @ http://codeforpeople.com/

On Jul 1, 2:04 pm, ara howard [email protected] wrote:

DESCRIPTION
SAMPLES

 end

~ > ruby sample/a.rb

 ---
 withoutthreadify: 7.41900205612183
 ---
 withthreadify: 3.69886112213135

Pretty cool. I tried it with file-find. Here was the code:

require ‘file/find’
require ‘threadify’

rule = File::Find.new(
:pattern => “*.rb”,
:path => “C:\ruby”
)

start = Time.now

rule.find.threadify(10){ |f|
p f
}

p start
p Time.now

Without threadify, it took 1:40 on my laptop. With threadify(10) it
dropped to 44 seconds.

I think I’ll add a “threads” option directly, and borrow some of your
code. :slight_smile:

Thanks,

Dan

Michael G. wrote:

I am not sure that what I am doing in the code is even reasonable,
however, I thought it might be worth pointing out.

Ok, there’s good news and bad news. First the good news.

I’ve found several egregious threading bugs in JRuby’s Enumerable
implementation that probably caused the bulk of errors you saw.
Basically, the runtime information for the main Ruby thread in JRuby was
getting reused by the blocks passed into threadify, causing all sorts of
wacky errors (multiple threads all sharing runtime thread data…fun!).
Fixing that seems to have resolved most of the errors.

Now the bad news…

What you’re doing is a bit suspect. In this case, it works out
reasonable well, since you’re just doing a map and gathering results.
There’s some remaining bugs in JRuby wrt the temporary data structure
used to gather map results (it needs to be made thread-safe) but it can
work. However in general I don’t think this use of threadify is going to
apply well to Enumera(ble|tor) since so many of the operations depend on
the result of the previous iteration.

I’ll have the remaining issues wrapped up shortly, but I’d love to see
someone come up with a safe set of Enumerable-like operations that can
run in parallel. For example, a detect that uses a cross-thread trigger
to stop all iterations (rather than the naive threadification of detect
which would not propagate a successful detection out of the thread).
Things like that could be very useful.

I’d also love to see someone come up with a nice installable gem of
truly thread-safe wrappers around the core collections, since in general
I don’t believe the core array and friends should suffer the perf
penalty that comes from always synchronizing.

  • Charlie

On Jul 11, 2008, at 2:38 PM, Charles Oliver N. wrote:

that can run in parallel. For example, a detect that uses a cross-
thread trigger to stop all iterations (rather than the naive
threadification of detect which would not propagate a successful
detection out of the thread). Things like that could be very useful.

I’d also love to see someone come up with a nice installable gem of
truly thread-safe wrappers around the core collections, since in
general I don’t believe the core array and friends should suffer the
perf penalty that comes from always synchronizing.

check out 0.0.3, it allows this, but the sync overhead is prohibitive
for in memory stuff - for network scraping it’d be great though.
anyhow, 0.0.3 allows one the ‘break’ from parallel processing and the
value broken with will be the same as if the jobs were run serially.
damn tricky that.

cheers.

a @ http://codeforpeople.com/

On Fri, Jul 11, 2008 at 4:38 PM, Charles Oliver N.
[email protected] wrote:

by the blocks passed into threadify, causing all sorts of wacky errors
Enumera(ble|tor) since so many of the operations depend on the result of the
thread-safe wrappers around the core collections, since in general I don’t
believe the core array and friends should suffer the perf penalty that comes
from always synchronizing.

Thanks Charlie, I just verified that my script no longer crashes with
my latest pull of JRuby.

jruby 1.1.3-dev (ruby 1.8.6 patchlevel 114) (2008-07-12 rev 7146)
[i386-java]

Regards,
Michael G.