Trying to make Array#collect massively parallel with OpenMP

Hi all,

Windows XP Home
VC++ 8 (free edition)

Just for kicks I tried creating a parallel Array#collect method, which
I called Array#acollect (asynch. collect). I added the following C
code to array.c, rebuilt and reinstalled, but it doesn’t seem to be
any faster. Could this be an issue with my compiler? Or a Windows
thing?

static VALUE rb_ary_acollect(VALUE ary){
long i;
VALUE collect;

if (!rb_block_given_p())
return rb_ary_new4(RARRAY(ary)->len, RARRAY(ary)->ptr);

collect = rb_ary_new2(RARRAY(ary)->len);

#pragma omp parallel for
for (i = 0; i < RARRAY(ary)->len; i++)
rb_ary_push(collect, rb_yield(RARRAY(ary)->ptr[i]));

return collect;
}

rb_define_method(rb_cArray, “acollect”, rb_ary_acollect, 0);

bench_collect.rb

require “benchmark”

MAX = 4000

array = []
MAX.times{ |n|
array[n] = 2 * n
}

No significant difference (?)

Benchmark.bm(30) do |x|
x.report(“Array#collect”){
MAX.times{ array.collect{ |e| e += 4 } }
}
x.report(“Array#acollect”){
MAX.times{ array.acollect{ |e| e += 4 } }
}
end

Ideas?

Thanks,

Dan

On 3/11/07, Daniel B. [email protected] wrote:

  for (i = 0; i < RARRAY(ary)->len; i++)

MAX = 4000
}
x.report(“Array#acollect”){
MAX.times{ array.acollect{ |e| e += 4 } }
}
end

Ideas?

Thanks,

Dan

Hi,

I’m no expert in either ruby internals or openmp. I’ve just a few
ideas for you (though most of them will be obvious probably):

  1. http://msdn2.microsoft.com/en-us/library/fw509c3b(VS.80).aspx says
    you need to add /openmp compiler switch (try using _OPENMP define to
    see if the compiler recognizes omp pragmas). #include “omp.h” might
    help as well.

  2. in the same page they say you won’t see any difference if the whole
    loop runs under 15 ms on a specific machine (i.e. the thread startup
    time)

  3. try running one loop outside the benchmark to setup the thread pool

  4. is the assignment intentional (e += 4)?

  5. (just for my curiosity:) are you using 1.8 or 1.9?

  6. I’ve heard that 1.8 interpreter runs in one thred. I suppose your
    code runs correctly with multiple threads because there should not be
    any (re)allocations (e.g. it might crash if there was a local var in
    the block). Right?

  7. what machine are you running this code on? (if you send me the
    binary or patch I might try it on my core2 machine if that helps)

  8. I suppose the difference might be bigger if you used more
    complicated (longer) block (relates to #2)

  9. if everything fails, try running pure c loops (without calling ruby
    functions) with omp optimisation, possibly wrapped into a ruby
    function to see if omp makes at least difference at c level

ok, I’ve run out of ideas for now…

Jano

On Mar 11, 4:37 pm, “Jan S.” [email protected] wrote:

I called Array#acollect (asynch. collect). I added the following C

No significant difference (?)

you need to add /openmp compiler switch (try using _OPENMP define to
see if the compiler recognizes omp pragmas). #include “omp.h” might
help as well.

Ah, thanks. I tried that and I got:

LINK : fatal error LNK1104: cannot open file ‘VCOMP.lib’

It doesn’t look like I have omp.h. This may be a header that’s not
included in the free version of VC++. I’ll have to ask around.

Thanks,

Dan

On Mar 12, 3:52 am, “Jan S.” [email protected] wrote:

long i;

array = []
MAX.times{ array.acollect{ |e| e += 4 } }

LINK : fatal error LNK1104: cannot open file ‘VCOMP.lib’

It doesn’t look like I have omp.h. This may be a header that’s not
included in the free version of VC++. I’ll have to ask around.

Thanks,

Dan

Seems OMP is included in standard and up.http://members.gamedev.net/Rivorus/surge/html/surge_act/setting_up_yo

Drat.

If you send me the patch/instructions I can compile that for you.

Edit array.c and add the rb_ary_acollect function in my OP anywhere
above the Init_array() declaration. Add “rb_define_method(rb_cArray,
“acollect”, rb_ary_acollect, 0);” where all the other method
defintions are (near the bottom). Then recompile and install.

Thanks,

Dan

On 3/12/07, Daniel B. [email protected] wrote:

Edit array.c and add the rb_ary_acollect function in my OP anywhere
above the Init_array() declaration. Add “rb_define_method(rb_cArray,
“acollect”, rb_ary_acollect, 0);” where all the other method
defintions are (near the bottom). Then recompile and install.

Thanks,

Dan

Hi,

as I wasn’t able to compile 1.8.5-p12 I used the brand new 1.8.6 and
VS 2005 SP1, XP SP2.

Patch to array.c is included and I added -openmp to CPPFLAGS.

First run as you posted in the original post:

W:\projects\ruby\ruby\ruby-1.8.6\win32>bench_collect.rb
user system total
real
Array#collect 11.422000 0.094000 11.516000 (
11.531000)
Array#acollect
W:/projects/ruby/ruby/ruby-1.8.6/win32/bench_collect.rb:17:in
initialize': exception reentered (fatal) from W:/projects/ruby/ruby/ruby-1.8.6/win32/bench_collect.rb:17 from W:/projects/ruby/ruby/ruby-1.8.6/win32/bench_collect.rb:17:in acollect’
from W:/projects/ruby/ruby/ruby-1.8.6/win32/bench_collect.rb:17
from
W:/projects/ruby/ruby/ruby-1.8.6/win32/bench_collect.rb:17:in
times' from W:/projects/ruby/ruby/ruby-1.8.6/win32/bench_collect.rb:17 from c:/ruby/lib/ruby/1.8/benchmark.rb:293:in measure’
from c:/ruby/lib/ruby/1.8/benchmark.rb:377:in report' from W:/projects/ruby/ruby/ruby-1.8.6/win32/bench_collect.rb:16 from c:/ruby/lib/ruby/1.8/benchmark.rb:177:in benchmark’
from c:/ruby/lib/ruby/1.8/benchmark.rb:207:in `bm’
from W:/projects/ruby/ruby/ruby-1.8.6/win32/bench_collect.rb:12

Second run without assignments:

W:\projects\ruby\ruby\ruby-1.8.6\win32>bench_collect.rb
user system total
real
Array#collect 6.875000 0.047000 6.922000 (
6.953000)
Array#acollect
W:/projects/ruby/ruby/ruby-1.8.6/win32/bench_collect.rb:17: [BUG]
cross-thread violation on rb_thread_schedule()
ruby 1.8.6 (2007-03-13) [i386-mswin32_80]

This application has requested the Runtime to terminate it in an unusual
way.
Please contact the application’s support team for more information.>

On 3/12/07, Daniel B. [email protected] wrote:

Just for kicks I tried creating a parallel Array#collect method, which
return rb_ary_new4(RARRAY(ary)->len, RARRAY(ary)->ptr);
rb_define_method(rb_cArray, “acollect”, rb_ary_acollect, 0);

Ideas?
1.Microsoft Learn: Build skills that open doors in your career

Thanks,

Dan

Seems OMP is included in standard and up.
http://members.gamedev.net/Rivorus/surge/html/surge_act/setting_up_your_compiler.html

If you send me the patch/instructions I can compile that for you.

Jano

On Mar 12, 4:02 pm, “Jan S.” [email protected] wrote:

Hi,

as I wasn’t able to compile 1.8.5-p12 I used the brand new 1.8.6 and
VS 2005 SP1, XP SP2.

Patch to array.c is included and I added -openmp to CPPFLAGS.

Second run without assignments:

W:\projects\ruby\ruby\ruby-1.8.6\win32>bench_collect.rb
user system total real
Array#collect 6.875000 0.047000 6.922000 ( 6.953000)
Array#acollect
W:/projects/ruby/ruby/ruby-1.8.6/win32/bench_collect.rb:17: [BUG]
cross-thread violation on rb_thread_schedule()
ruby 1.8.6 (2007-03-13) [i386-mswin32_80]

Ouch. I had a feeling it would collapse. Maybe wrapping the relevant
code in RUBY_CRITICAL would work, but that may defeat the purpose,
assuming it even works at all.

Oh, well. It was fun to try at least. :slight_smile:

Thanks,

Dan

On 3/13/07, Daniel B. [email protected] wrote:

Dan
Second run without assignments:
code in RUBY_CRITICAL would work, but that may defeat the purpose,
assuming it even works at all.

Oh, well. It was fun to try at least. :slight_smile:

Thanks,

Dan

I have tried with 1.9 as well. It crashed in an even more interesting
way – two pages of hex numbers, see below.

So the result is when ruby will be able to handle threads, this could
be a way to speed it up. So far it goes faster down :wink:

Jano

                                user     system      total 

real
Array#collect 2.515000 0.031000 2.546000 (
2.546000)
Array#acollect – stack frame ------------
0000 (00BD0020): 00000004
0001 (00BD0024): 00000005
0002 (00BD0028): 00b5f70c
0003 (00BD002C): 00000004
0004 (00BD0030): 00b5f6e4
0005 (00BD0034): 00b60738
0006 (00BD0038): 0000003d
0007 (00BD003C): 00b5f6f8
0008 (00BD0040): 00b5f6d0
0009 (00BD0044): 00000004
0010 (00BD0048): 00ba5049
0011 (00BD004C): 00000004
0012 (00BD0050): 00b5f694
0013 (00BD0054): 0000003d
0014 (00BD0058): 00b5feb4
0015 (00BD005C): 00b5f680
0016 (00BD0060): 00000000
0017 (00BD0064): 00000004
0018 (00BD0068): 00000004
0019 (00BD006C): 00ba5049
0020 (00BD0070): 00b5f66c
0021 (00BD0074): 00b5f630
0022 (00BD0078): 00b5f66c
0023 (00BD007C): 00b4bc20
0024 (00BD0080): 00b4bc0c
0025 (00BD0084): 00b4bbf8
0026 (00BD0088): 00000004
0027 (00BD008C): 00000004
0028 (00BD0090): 00baa369
0029 (00BD0094): 00b60738
0030 (00BD0098): 00b4bbd0
0031 (00BD009C): 00b4bb58
0032 (00BD00A0): 00b4bb08
0033 (00BD00A4): 00000004
0034 (00BD00A8): 00000004
0035 (00BD00AC): 00000004
0036 (00BD00B0): 00baa369
0037 (00BD00B4): 00ba51bd
0038 (00BD00B8): 00001f41
0039 (00BD00BC): 00000004
0040 (00BD00C0): 00c4fdf5
0041 (00BD00C4): 00000004
0042 (00BD00C8): 00c4fdf5
0043 (00BD00CC): 00bd00b5 (= 37)
0044 (00BD00D0): 00b5f70c
0045 (00BD00D4): 00000004
0046 (00BD00D8): 00c4fd35
0047 (00BD00DC): 00000004
0048 (00BD00E0): 00c4fd35
0049 (00BD00E4): 00000231
0050 (00BD00E8): 00000231
0051 (00BD00EC): 00000004
0052 (00BD00F0): 00000001 ← lfp ← dfp
– control frame ----------
c:0015 p:---- s:0053 b:0053 l:000052 d:000052 CFUNC :initialize
c:0014 p:---- s:0051 b:0053 l:000052 d:000052 CFUNC :new
c:0013 p:---- s:0047 b:0047 l:000046 d:000046 CFUNC :acollect
c:0012 p:0008 s:0044 b:0044 l:000000D8 d:000043 BLOCK
bench_collect.rb:17
c:0011 p:---- s:0043 b:0043 l:000042 d:000042 FINISH
c:0010 p:---- s:0041 b:0041 l:000040 d:000040 CFUNC :times
c:0009 p:0013 s:0038 b:0038 l:000000D8 d:000037 BLOCK
bench_collect.rb:17
c:0008 p:0037 s:0037 b:0037 l:000036 d:000036 METHOD
c:/ruby19/usr/lib/ruby/1.9/
benchmark.rb:293
c:0007 p:0037 s:0029 b:0029 l:000028 d:000028 METHOD
c:/ruby19/usr/lib/ruby/1.9/
benchmark.rb:377
c:0006 p:0023 s:0022 b:0022 l:000000D8 d:0000026C BLOCK
bench_collect.rb:16
c:0005 p:0134 s:0020 b:0020 l:000019 d:000019 METHOD
c:/ruby19/usr/lib/ruby/1.9/
benchmark.rb:177
c:0004 p:0037 s:0011 b:0011 l:000010 d:000010 METHOD
c:/ruby19/usr/lib/ruby/1.9/
benchmark.rb:207
c:0003 p:0048 s:0005 b:0005 l:000000D8 d:000000D8 TOP
bench_collect.rb:12
c:0002 p:---- s:0002 b:0002 l:000001 d:000001 FINISH
c:0001 p:---- s:0000 b:-001 l:000000 d:000000 ------

– stack frame ------------
– control frame ----------
c:0019 p:---- s:-3096584 b:-001 l:00000000 d:00000000 ------
c:0018 p:---- s:-3096584 b:-001 l:00000000 d:00000000 ------
c:0017 p:---- s:-3096584 b:-001 l:00000000 d:00000000 ------
c:0016 p:---- s:-3096584 b:-001 l:00000000 d:00000000 ------
c:0015 p:---- s:0053 b:0053 l:000052 d:000052 CFUNC :initialize
c:0014 p:---- s:0051 b:0053 l:000052 d:000052 CFUNC :new
c:0013 p:---- s:0047 b:0047 l:000046 d:000046 CFUNC :acollect
c:0012 p:0008 s:0044 b:0044 l:000000D8 d:000043 BLOCK
bench_collect.rb:17
c:0011 p:---- s:0043 b:0043 l:000042 d:000042 FINISH
c:0010 p:---- s:0041 b:0041 l:000040 d:000040 CFUNC :times
c:0009 p:0013 s:0038 b:0038 l:000000D8 d:000037 BLOCK
bench_collect.rb:17
c:0008 p:0037 s:0037 b:0037 l:000036 d:000036 METHOD
c:/ruby19/usr/lib/ruby/1.9/
benchmark.rb:293
c:0007 p:0037 s:0029 b:0029 l:000028 d:000028 METHOD
c:/ruby19/usr/lib/ruby/1.9/
benchmark.rb:377
c:0006 p:0023 s:0022 b:0022 l:000000D8 d:0000026C BLOCK
bench_collect.rb:16
c:0005 p:0134 s:0020 b:0020 l:000019 d:000019 METHOD
c:/ruby19/usr/lib/ruby/1.9/
benchmark.rb:177
c:0004 p:0037 s:0011 b:0011 l:000010 d:000010 METHOD
c:/ruby19/usr/lib/ruby/1.9/
benchmark.rb:207
c:0003 p:0048 s:0005 b:0005 l:000000D8 d:000000D8 TOP
bench_collect.rb:12
c:0002 p:---- s:0002 b:0002 l:000001 d:000001 FINISH
c:0001 p:---- s:0000 b:-001 l:000000 d:000000 ------

– stack frame ------------
0000 (00BD0020): 00000004
0001 (00BD0024): 00000005
0002 (00BD0028): 00b5f70c
0003 (00BD002C): 00000004
0004 (00BD0030): 00b5f6e4
0005 (00BD0034): 00b60738
0006 (00BD0038): 0000003d
0007 (00BD003C): 00b5f6f8
0008 (00BD0040): 00b5f6d0
0009 (00BD0044): 00000004
0010 (00BD0048): 00ba5049
0011 (00BD004C): 00000004
0012 (00BD0050): 00b5f694
0013 (00BD0054): 0000003d
0014 (00BD0058): 00b5feb4
0015 (00BD005C): 00b5f680
0016 (00BD0060): 00000000
0017 (00BD0064): 00000004
0018 (00BD0068): 00000004
0019 (00BD006C): 00ba5049
0020 (00BD0070): 00b5f66c
0021 (00BD0074): 00b5f630
0022 (00BD0078): 00b5f66c
0023 (00BD007C): 00b4bc20
0024 (00BD0080): 00b4bc0c
0025 (00BD0084): 00b4bbf8
0026 (00BD0088): 00000004
0027 (00BD008C): 00000004
0028 (00BD0090): 00baa369
0029 (00BD0094): 00b60738
0030 (00BD0098): 00b4bbd0
0031 (00BD009C): 00b4bb58
0032 (00BD00A0): 00b4bb08
0033 (00BD00A4): 00000004
0034 (00BD00A8): 00000004
0035 (00BD00AC): 00000004
0036 (00BD00B0): 00baa369
0037 (00BD00B4): 00ba51bd
0038 (00BD00B8): 00001f41
0039 (00BD00BC): 00000004
0040 (00BD00C0): 00c4fdf5
0041 (00BD00C4): 00000004
0042 (00BD00C8): 00c4fdf5
0043 (00BD00CC): 00bd00b5 (= 37)
0044 (00BD00D0): 00b5f70c
0045 (00BD00D4): 00000004
0046 (00BD00D8): 00c4fd35
0047 (00BD00DC): 00000004
0048 (00BD00E0): 00c4fd35
0049 (00BD00E4): 00000231
0050 (00BD00E8): 00000231
– control frame ----------
c:0014 p:---- s:0051 b:0053 l:000052 d:000052 CFUNC :new
c:0013 p:---- s:0047 b:0047 l:000046 d:000046 CFUNC :acollect
c:0012 p:0008 s:0044 b:0044 l:000000D8 d:000043 BLOCK
bench_collect.rb:17
c:0011 p:---- s:0043 b:0043 l:000042 d:000042 FINISH
c:0010 p:---- s:0041 b:0041 l:000040 d:000040 CFUNC :times
c:0009 p:0013 s:0038 b:0038 l:000000D8 d:000037 BLOCK
bench_collect.rb:17
c:0008 p:0037 s:0037 b:0037 l:000036 d:000036 METHOD
c:/ruby19/usr/lib/ruby/1.9/
benchmark.rb:293
c:0007 p:0037 s:0029 b:0029 l:000028 d:000028 METHOD
c:/ruby19/usr/lib/ruby/1.9/
benchmark.rb:377
c:0006 p:0023 s:0022 b:0022 l:000000D8 d:0000026C BLOCK
bench_collect.rb:16
c:0005 p:0134 s:0020 b:0020 l:000019 d:000019 METHOD
c:/ruby19/usr/lib/ruby/1.9/
benchmark.rb:177
c:0004 p:0037 s:0011 b:0011 l:000010 d:000010 METHOD
c:/ruby19/usr/lib/ruby/1.9/
benchmark.rb:207
c:0003 p:0048 s:0005 b:0005 l:000000D8 d:000000D8 TOP
bench_collect.rb:12
c:0002 p:---- s:0002 b:0002 l:000001 d:000001 FINISH
c:0001 p:---- s:0000 b:-001 l:000000 d:000000 ------

DBG> : “bench_collect.rb:17:in acollect'" DBG> : "bench_collect.rb:17:in block (3 levels) in '”
DBG> : “bench_collect.rb:17:in times'" DBG> : "bench_collect.rb:17:in block (2 levels) in '”
DBG> : “c:/ruby19/usr/lib/ruby/1.9/benchmark.rb:293:in measure'" DBG> : "c:/ruby19/usr/lib/ruby/1.9/benchmark.rb:377:in item’”
DBG> : “bench_collect.rb:16:in block in <main>'" DBG> : "c:/ruby19/usr/lib/ruby/1.9/benchmark.rb:177:in benchmark’”
DBG> : “c:/ruby19/usr/lib/ruby/1.9/benchmark.rb:207:in bm'" DBG> : "bench_collect.rb:12:in '”
[BUG] cfp consistency error - call0
ruby 1.9.0 (2007-03-13) [i386-mswin32_80]

This application has requested the Runtime to terminate it in an unusual
way.
Please contact the application’s support team for more information.

On 3/12/07, Daniel B. [email protected] wrote:

On Mar 12, 4:02 pm, “Jan S.” [email protected] wrote:

Ouch. I had a feeling it would collapse. Maybe wrapping the relevant
code in RUBY_CRITICAL would work, but that may defeat the purpose,
assuming it even works at all.

Even if it worked it would have some other problems. If the block
argument was at all sensitive to evaluation order, for example, the
results would be indeterminate I think.

For a cooked up example:

i = 0
(1…100).to_a.acollect {|elem| i += 1}

Interaction with the GC might also be interesting.


Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/