Forum: Ruby Concurent (using threads) slower than sequential -doubt

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
F0e2402b20933bf854ba0fb2fdcbacd6?d=identicon&s=25 Carlos Ortega (caof2005)
on 2008-10-06 05:41
Hi Folks.
While starting to study the benefits of using threads in Ruby, I tried
to solve the following problem:

I have 3 text files ( numbers0.txt, numbers1.txt, c:\numbers2.txt ),
each file contains a very large list of numbers.
I attempt to read and compute each file by using a different thread.
Finally I tried to sum all subtotals to provide the final result.

Here is the code.
===================

require 'thread'
m_threads = []

print "INITIAL TIME  := ", initial_time = Time.now, "\n"
3.times do |i|
  m_threads[i] = Thread.new do
    total_per_thread = 0
    case i
       when 0 then path = "C:\\numbers0.txt"
       when 1 then path = "C:\\numbers1.txt"
       when 2 then path = "C:\\numbers2.txt"
    end
    File.open( path, "r"  ) do |m_file|
      while line = m_file.gets
          total_per_thread = line.to_i + total_per_thread
      end
      Thread.current[:INDEX] = total_per_thread
    end
  end
end

result = 0
m_threads.each{ |t| t.join; result = t[:INDEX] + result; }

print "FINAL TIME   := ", final_time = Time.now, "\n"
print "TOTAL TIME  := ", total_time = final_time-initial_time, "\n"
print "Total                 := ", result, "\n"

=======================================
Output (CONCURRENT - Using Threads):

INITIAL TIME  := Sun Oct 05 22:07:26 -0500 2008

FINAL TIME    := Sun Oct 05 22:07:38 -0500 2008
TOTAL TIME  := 11.485
Total                 := 1150000000
========================================

I verified and each thread made the job, result is OK too.
I also solved the same problem by using a sequential program with no
threads at all
Here is the code:

print "INITIAL Time := ", initial_time = Time.now, "\n"

paths = [ "C:\\numbers0.txt", "C:\\numbers1.txt", "C:\\numbers2.txt" ]
result = 0
for m_path in paths
  File.open( m_path, "r+"  ) do |m_file|
    while line = m_file.gets
      result = line.to_i + result
    end
  end
end

print "FINAL time     := ", final_time = Time.now, "\n"
print "TOTAL time    := ", total_time = final_time - initial_time, "\n"
print "Total                 := ", result, "\n"

=======================================
Output: (SECUENCIAL- NO Threads)

INITIAL TIME := Sun Oct 05 22:34:47 -0500 2008
FINAL TIME    := Sun Oct 05 22:34:57 -0500 2008
TOTAL TIME   := 10.656
Total                  := 1150000000

=======================================
As you see, the thread based program run slower.
I thought that by using threads it will be faster, but it didn't....Why
is it slower?

Any help will be very appreciated
0ec4920185b657a03edf01fff96b4e9b?d=identicon&s=25 Yukihiro Matsumoto (Guest)
on 2008-10-06 06:00
(Received via mailing list)
Hi,

In message "Re: Concurent (using threads) slower than sequential -doubt"
    on Mon, 6 Oct 2008 12:40:08 +0900, Carlos Ortega
<caof2005@yahoo.com> writes:

|As you see, the thread based program run slower.
|I thought that by using threads it will be faster, but it didn't....Why
|is it slower?

Threads require context switching, so that they tend to run slower,
especially green threads like Ruby 1.8 has.

              matz.
Ede2aa10c6462f1d825143879be59e38?d=identicon&s=25 Charles Oliver Nutter (Guest)
on 2008-10-06 07:21
(Received via mailing list)
Carlos Ortega wrote:
> As you see, the thread based program run slower.
> I thought that by using threads it will be faster, but it didn't....Why
> is it slower?

You may want to try with JRuby, which actually uses native threads. On a
multi-core system, it should improve performance.

- Charlie
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2008-10-06 11:42
(Received via mailing list)
2008/10/6 Yukihiro Matsumoto <matz@ruby-lang.org>
>
> In message "Re: Concurent (using threads) slower than sequential -doubt"
>    on Mon, 6 Oct 2008 12:40:08 +0900, Carlos Ortega <caof2005@yahoo.com> writes:
>
> |As you see, the thread based program run slower.
> |I thought that by using threads it will be faster, but it didn't....Why
> |is it slower?
>
> Threads require context switching, so that they tend to run slower,
> especially green threads like Ruby 1.8 has.

There is another issue which may easily have a more serious impact:
since all three files reside in the same directory they are read from
the same physical device (most likely a local (S)ATA disk). And since
these files are large chances are that they are spread over the disk
and do not fit into the operating systems buffer cache. This will lead
to reasonably more head movement and less efficient disk caching than
the sequential approach.

Kind regards

robert
F0e2402b20933bf854ba0fb2fdcbacd6?d=identicon&s=25 Carlos Ortega (caof2005)
on 2008-10-06 15:42
Thank all of you (Matz, Charles and Robert)

Just one more doubt.....

  Since the threads I created really resides as an array that holds
threads object I tried to access each one by using [ ] notation:

for t in m_threads
  print t[:INDEX], "\n"
end

The interpreter does not throw any error, but results always indicate:
nil
nil
nil

I tried to verify if they are still running:

Thread.list.each{|t| p t}

Results were:
#<Thread:0x29c5fc0 run>
#<Thread:0x29c6100 run>
#<Thread:0x29c6240 run>
#<Thread:0x294c74c run>

So indeed they are running... the doubt is...why I can't access the
content of the array?
In fact in the statement
     m_threads.each{ |t| t.join; result = t[:INDEX] + result; }

I just can compute result variable only after executing  t.join..... if
I take out the t.join statement the interpreter throws an error:

PbaThreads.rb:10 : undefined method `+' for nil:NilClass (NoMethodError)

Could you clarify this, please.

Best Regards




Robert Klemme wrote:
> 2008/10/6 Yukihiro Matsumoto <matz@ruby-lang.org>
>>
>> In message "Re: Concurent (using threads) slower than sequential -doubt"
>>    on Mon, 6 Oct 2008 12:40:08 +0900, Carlos Ortega <caof2005@yahoo.com> writes:
>>
>> |As you see, the thread based program run slower.
>> |I thought that by using threads it will be faster, but it didn't....Why
>> |is it slower?
>>
>> Threads require context switching, so that they tend to run slower,
>> especially green threads like Ruby 1.8 has.
>
> There is another issue which may easily have a more serious impact:
> since all three files reside in the same directory they are read from
> the same physical device (most likely a local (S)ATA disk). And since
> these files are large chances are that they are spread over the disk
> and do not fit into the operating systems buffer cache. This will lead
> to reasonably more head movement and less efficient disk caching than
> the sequential approach.
>
> Kind regards
>
> robert
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2008-10-06 16:41
(Received via mailing list)
2008/10/6 Carlos Ortega <caof2005@yahoo.com>:
>
> #<Thread:0x29c5fc0 run>
> I take out the t.join statement the interpreter throws an error:
>
> PbaThreads.rb:10 : undefined method `+' for nil:NilClass (NoMethodError)
>
> Could you clarify this, please.

Well, this is obvious: you cannot access the result before it's there.
 Since you are setting this as the last statement in the thread you
need to wait (i.e. join) until the thread finishes.

Btw, you can use Thread#value for this.  Here's a variant:


require 'benchmark'

files = (1..3).map {|i| "C:\\numbers#{i}.txt"}

Benchmark.bmbm 10 do |b|
  b.report "threaded" do
    threads = files.map do |file|
      Thread.new file do |f|
        File.open f do |io|
          io.inject(0) {|sum, l| sum + l.to_i}
        end
      end
    end

    puts threads.inject(0) {|sum, th| sum + th.value}
  end

  b.report "sequential" do
    puts files.inject(0) {|s, f|
      File.open f do |io|
        io.inject(s) {|sum, l| sum + l.to_i}
      end
    }
  end
end


Kind regards

robert
6b0967f63d03e99b6c07a3f5ed224c77?d=identicon&s=25 Erik Veenstra (Guest)
on 2008-10-06 23:00
(Received via mailing list)
If you are on Linux, you might want to have a look at the gem
"forkandreturn" [1]. ForkAndReturn handles each element in an
enumeration in a seperate process [2].

gegroet,
Erik V. - http://www.erikveen.dds.nl/

[1] http://www.erikveen.dds.nl/forkandreturn/doc/index.html

[2] ...if you're on a multicore machine. Oops. Will be fixed in
the next release.

----------------------------------------------------------------

 $ cat count1.rb
 files   = ["numbers0.txt", "numbers1.txt", "numbers2.txt"]
 result  = 0

 files.collect do |file|
   res   = 0

   File.open(file) do |file|
     file.each do |line|
       res += line.to_i
     end
   end

   res
 end.each do |res|
   result += res
 end

 p result

----------------------------------------------------------------

 $ diff -ur count[12].rb | clean_diff
 +require "forkandreturn"
 +
  files   = ["numbers0.txt", "numbers1.txt", "numbers2.txt"]
  result = 0

 -files.collect do |file|
 +files.concurrent_collect do |file|
    res  = 0

    File.open(file) do |file|

----------------------------------------------------------------

 $ time ruby count1.rb
 81627450482688

 real    0m15.309s
 user    0m15.201s
 sys     0m0.076s

----------------------------------------------------------------

 $ time ruby count2.rb
 81627450482688

 real    0m8.976s    <=== Multicore!
 user    0m17.177s   <=== Multicore!
 sys     0m0.204s

----------------------------------------------------------------

 $ uname -a
 Linux laptop 2.6.24-19-generic #1 SMP Wed Aug 20 22:56:21 UTC 2008
i686 GNU/Linux

----------------------------------------------------------------

 $ ruby --version
 ruby 1.8.6 (2008-06-20 patchlevel 230) [i686-linux]

----------------------------------------------------------------

 $ gem list | grep -ie forkandreturn
 forkandreturn (0.2.0)
6b0967f63d03e99b6c07a3f5ed224c77?d=identicon&s=25 Erik Veenstra (Guest)
on 2008-10-07 00:09
(Received via mailing list)
> [2] ...if you're on a multicore machine. Oops. Will be fixed in
> the next release.

It's released...

gegroet,
Erik V.
F0e2402b20933bf854ba0fb2fdcbacd6?d=identicon&s=25 Carlos Ortega (caof2005)
on 2008-10-07 05:27
Erik Veenstra wrote:
>> [2] ...if you're on a multicore machine. Oops. Will be fixed in
>> the next release.
>
> It's released...
>
> gegroet,
> Erik V.

Thank you Erik and Robert...

I will try on both environments.

Regards
Carlos
F47de131a8f372c1a52262ae95a5b5a6?d=identicon&s=25 Prashant Srinivasan (Guest)
on 2008-10-08 02:12
(Received via mailing list)
Carlos, that sounds about correct.  I did some similar tests early this
year[1].  Basically your problem is that Ruby runs on one kernel
thread/LWP irrespective of how many user land threads you create.  It's
expensive to switch between threads(cost varies depending on which
hardware platform you're running on) - so these two factors combine to
make it slower for you when you use threads.

 JRuby was almost just as bad until JRuby 1.1.1 after which it started
doing better with threads(this was due to a bug fix by Charles [2]).
It's now much better at scaling with threads compared with MRI, but
still quite poor in absolute terms[3] - it's scalability on an
embarrassingly threaded program eroded 54% jumping from 1 to 2 threads
and became worse after that.  (*Caveat:* My numbers are old, they're
from March, and things may have gotten much better since!)

[1]
http://blogs.sun.com/prashant/resource/files/jruby...
[2] Ref to Charles' entry
http://blog.headius.com/2008/04/shared-data-consid...
[3] http://blogs.sun.com/prashant/resource/files/jruby...

 -ps
This topic is locked and can not be replied to.