this one’s for you charlie
NAME
threadify.rb
SYNOPSIS
enumerable = %w( a b c d )
enumerable.threadify(2){ ‘process this block using two worker
threads’ }
DESCRIPTION
threadify.rb makes it stupid easy to process a bunch of data using
‘n’
worker threads
INSTALL
gem install threadify
URI
http://rubyforge.org/projects/codeforpeople
SAMPLES
<========< sample/a.rb >========>
~ > cat sample/a.rb
require 'open-uri'
require 'yaml'
require 'rubygems'
require 'threadify'
uris =
%w(
http://google.com
http://yahoo.com
http://rubyforge.org
http://ruby-lang.org
http://kcrw.org
http://drawohara.com
http://codeforpeople.com
)
time 'without threadify' do
uris.each do |uri|
body = open(uri){|pipe| pipe.read}
end
end
time 'with threadify' do
uris.threadify do |uri|
body = open(uri){|pipe| pipe.read}
end
end
BEGIN {
def time label
a = Time.now.to_f
yield
ensure
b = Time.now.to_f
y label => (b - a)
end
}
~ > ruby sample/a.rb
---
without threadify: 7.41900205612183
---
with threadify: 3.69886112213135
a @ http://codeforpeople.com/
Thank you! This gem pretty much makes my life simpler, and will
continue to make it simpler!
(stdlib please?)
~ ari
On Tue, Jul 1, 2008 at 1:04 PM, ara howard [email protected]
wrote:
URI
http://rubyforge.org/projects/codeforpeople
I only see a tgz link which redirects me to
http://rubyforge.rubyuser.de/codeforpeople/threadify-0.0.1.tgz
which in turn 404s
martin
ara howard wrote:
this one’s for you charlie
Appears to work just dandy under JRuby:
âž” time jruby --server -rthreadify -e “nums = *(1…35); def fib(n); if n
< 2; return n; else; return fib(n - 1) + fib(n - 2); end; end; nums.each
{|i| p fib(i)}”
…
real 0m11.889s
user 0m11.733s
sys 0m0.188s
~/NetBeansProjects/jruby âž” time jruby --server -rthreadify -e “nums =
*(1…35); def fib(n); if n < 2; return n; else; return fib(n - 1) +
fib(n - 2); end; end; nums.threadify {|i| p fib(i)}”
…
real 0m8.213s
user 0m12.722s
sys 0m0.178s
(One thread on my system consumes roughly 65-70% CPU, which explains why
full CPU on both cores doesn’t double performance here)
I also found some weird bug where Thread#kill/exit from within the
thread interacts weirdly with join happening outside, and never
terminates. Fixing that now.
On Tue, Jul 1, 2008 at 2:02 PM, Charles Oliver N.
[email protected] wrote:
mirror delay. check codeforpeople svn, it’s only one file.
thanks, gotit. will also install the gem when it propagates, just to
keep my system informed
m.
On Jul 1, 2008, at 3:11 PM, Martin DeMello wrote:
thanks, gotit. will also install the gem when it propagates, just to
keep my system informed
0.0.2 and gem should be up
a @ http://codeforpeople.com/
Martin DeMello wrote:
On Tue, Jul 1, 2008 at 1:04 PM, ara howard [email protected] wrote:
URI
http://rubyforge.org/projects/codeforpeople
I only see a tgz link which redirects me to
http://rubyforge.rubyuser.de/codeforpeople/threadify-0.0.1.tgz
which in turn 404s
mirror delay. check codeforpeople svn, it’s only one file.
On Jul 1, 2008, at 3:04 PM, Charles Oliver N. wrote:
Appears to work just dandy under JRuby:
âž” time jruby --server -rthreadify -e “nums = *(1…35); def fib(n);
if n < 2; return n; else; return fib(n - 1) + fib(n - 2); end; end;
nums.each {|i| p fib(i)}”
wow that’s cool - now that’s a a seriously easy way to parallelize
I also found some weird bug where Thread#kill/exit from within the
thread interacts weirdly with join happening outside, and never
terminates. Fixing that now.
glad to have helped
i just pushed out 0.0.2 and it just lets the thread die rather that
self-destructing. see how that works…
cheers.
a @ http://codeforpeople.com/
ara.t.howard wrote:
i just pushed out 0.0.2 and it just lets the thread die rather that
self-destructing. see how that works…
I fixed in JRuby just now (Thread#kill does an implicit join in JRuby to
make sure the thread dies…but if target == caller it was still trying
to join itself in a weird way) but basically breaking out of the loop
instead of Thread#exit solved it. Your 0.0.2 change is probably
equivalent.
On Jul 1, 2008, at 3:04 PM, Charles Oliver N. wrote:
Appears to work just dandy under JRuby:
âž” time jruby --server -rthreadify -e “nums = *(1…35); def fib(n);
if n < 2; return n; else; return fib(n - 1) + fib(n - 2); end; end;
nums.each {|i| p fib(i)}”
wow that’s cool - now that’s a a seriously easy way to parallelize
I also found some weird bug where Thread#kill/exit from within the
thread interacts weirdly with join happening outside, and never
terminates. Fixing that now.
glad to have helped
i just pushed out 0.0.2 and it just lets the thread die rather that
self-destructing. see how that works…
cheers.
a @ http://codeforpeople.com/
On Tue, Jul 1, 2008 at 5:04 PM, Charles Oliver N.
[email protected] wrote:
ara howard wrote:
this one’s for you charlie
Appears to work just dandy under JRuby:
I was doing some comparison between threadify and peach with JRuby,
when I noticed some interesting behavior with using
Enumerator#to_enum.
Sample code and three different results are posted here:
http://pastie.org/230287. Each result randomly occurs and sometimes
the code produces no error whatsoever. MRI does not seem to exhibit
the same behavior.
I am not sure that what I am doing in the code is even reasonable,
however, I thought it might be worth pointing out.
threadify-0.0.2
jruby 1.1.3-dev (ruby 1.8.6 patchlevel 114) (2008-07-08 rev 7130)
[i386-java]
java version “1.5.0_13”
Java™ 2 Runtime Environment, Standard Edition (build
1.5.0_13-b05-237)
Java HotSpot™ Client VM (build 1.5.0_13-119, mixed mode, sharing)
OS X 10.5.4
Thanks,
Michael G.
On Jul 8, 2008, at 5:43 PM, Michael G. wrote:
I was doing some comparison between threadify and peach with JRuby,
Thanks,
Michael G.
bunch of ‘java.lang’ stuff in there - i’m out!
a @ http://codeforpeople.com/
Michael G. wrote:
Enumerator#to_enum.
Sample code and three different results are posted here:
http://pastie.org/230287. Each result randomly occurs and sometimes
the code produces no error whatsoever. MRI does not seem to exhibit
the same behavior.
I am not sure that what I am doing in the code is even reasonable,
however, I thought it might be worth pointing out.
Thanks for filing the bug. I’m looking into it now.
In general we have inserted synchronization code only where it really
appears to be necessary to maintain the integrity of data structures.
That means that in some cases, you need to be mindful of code actually
running in parallel against e.g. arrays, hashes, strings, and so on. But
we do want to reduce the possibility of a Java exception, so I’ll
investigate a bit.
Charles Oliver N. wrote:
In general we have inserted synchronization code only where it really
appears to be necessary to maintain the integrity of data structures.
That means that in some cases, you need to be mindful of code actually
running in parallel against e.g. arrays, hashes, strings, and so on. But
we do want to reduce the possibility of a Java exception, so I’ll
investigate a bit.
I did find a few threading bugs in JRuby, and I’m working on them now.
Most of them seem specific to Enumerator…
On Jul 10, 2008, at 8:39 PM, Daniel B. wrote:
start = Time.now
I think I’ll add a “threads” option directly, and borrow some of your
code.
Thanks,
Dan
sweet. i wouldn’t launch rockets with it - but it a cheap speedup for
a bunch of ruby code. btw - check out my find method
http://codeforpeople.com/lib/ruby/alib/alib-0.5.1/lib/alib-0.5.1/find2.rb
very stolen and hacked
a @ http://codeforpeople.com/
On Jul 1, 2:04 pm, ara howard [email protected] wrote:
DESCRIPTION
SAMPLES
end
~ > ruby sample/a.rb
---
withoutthreadify: 7.41900205612183
---
withthreadify: 3.69886112213135
Pretty cool. I tried it with file-find. Here was the code:
require ‘file/find’
require ‘threadify’
rule = File::Find.new(
:pattern => “*.rb”,
:path => “C:\ruby”
)
start = Time.now
rule.find.threadify(10){ |f|
p f
}
p start
p Time.now
Without threadify, it took 1:40 on my laptop. With threadify(10) it
dropped to 44 seconds.
I think I’ll add a “threads” option directly, and borrow some of your
code.
Thanks,
Dan
Michael G. wrote:
I am not sure that what I am doing in the code is even reasonable,
however, I thought it might be worth pointing out.
Ok, there’s good news and bad news. First the good news.
I’ve found several egregious threading bugs in JRuby’s Enumerable
implementation that probably caused the bulk of errors you saw.
Basically, the runtime information for the main Ruby thread in JRuby was
getting reused by the blocks passed into threadify, causing all sorts of
wacky errors (multiple threads all sharing runtime thread data…fun!).
Fixing that seems to have resolved most of the errors.
Now the bad news…
What you’re doing is a bit suspect. In this case, it works out
reasonable well, since you’re just doing a map and gathering results.
There’s some remaining bugs in JRuby wrt the temporary data structure
used to gather map results (it needs to be made thread-safe) but it can
work. However in general I don’t think this use of threadify is going to
apply well to Enumera(ble|tor) since so many of the operations depend on
the result of the previous iteration.
I’ll have the remaining issues wrapped up shortly, but I’d love to see
someone come up with a safe set of Enumerable-like operations that can
run in parallel. For example, a detect that uses a cross-thread trigger
to stop all iterations (rather than the naive threadification of detect
which would not propagate a successful detection out of the thread).
Things like that could be very useful.
I’d also love to see someone come up with a nice installable gem of
truly thread-safe wrappers around the core collections, since in general
I don’t believe the core array and friends should suffer the perf
penalty that comes from always synchronizing.
On Jul 11, 2008, at 2:38 PM, Charles Oliver N. wrote:
that can run in parallel. For example, a detect that uses a cross-
thread trigger to stop all iterations (rather than the naive
threadification of detect which would not propagate a successful
detection out of the thread). Things like that could be very useful.
I’d also love to see someone come up with a nice installable gem of
truly thread-safe wrappers around the core collections, since in
general I don’t believe the core array and friends should suffer the
perf penalty that comes from always synchronizing.
check out 0.0.3, it allows this, but the sync overhead is prohibitive
for in memory stuff - for network scraping it’d be great though.
anyhow, 0.0.3 allows one the ‘break’ from parallel processing and the
value broken with will be the same as if the jobs were run serially.
damn tricky that.
cheers.
a @ http://codeforpeople.com/
On Fri, Jul 11, 2008 at 4:38 PM, Charles Oliver N.
[email protected] wrote:
by the blocks passed into threadify, causing all sorts of wacky errors
Enumera(ble|tor) since so many of the operations depend on the result of the
thread-safe wrappers around the core collections, since in general I don’t
believe the core array and friends should suffer the perf penalty that comes
from always synchronizing.
Thanks Charlie, I just verified that my script no longer crashes with
my latest pull of JRuby.
jruby 1.1.3-dev (ruby 1.8.6 patchlevel 114) (2008-07-12 rev 7146)
[i386-java]
Regards,
Michael G.