Point me to help w/ multithreading in 1.9.2-p0

Hi Folks - A week or two ago, I pinged this list for recommendations on
a load testing gem. Unfortunately, didn’t see much response from that
that pointed me in the right direction. So I’ve set about to write my
own, using threads, and can’t find proper resources to help me
understand what’s going on w/ said threads.

Here’s the two key methods I’m using and figure are the source of my
troubles. First question is whether there’s any glaring errors here
that I’m missing:

class ThreadedLoadTester

def initialize(users, apis, session=nil)

@threads = []
@count = users
@session = session
1.upto @count.to_i do

  @threads << Thread.new do
    Thread.current[:hi5] = Hi5fbapi.new 'redacted params'
    Thread.current["api_calls"] = []

    apis.each do |api|
#get_call returns a proc object w/ a snippet of test code
      Thread.current["api_calls"] << get_call(api)
    end

    Thread.stop
  end
end

end

def run_threads()

@threads.each do |thr|
  thr[:api_calls].each do |api|
    p api.call thr[:hi5], @session
  end
end

@threads.each {|t| t.wakeup.join}
return nil

end

Basically, I spin up a bunch of threads when I initialize the object,
populate an array in each thread w/ a series of proc objects, and then
put them to sleep. Then, when I run_threads(), the idea is to iterate
over each thread, iterate over the thread’s array of procs, and call
each one. Oh, and this object is consumed by a sinatra-based web app.

This works sometimes. And sometimes it segfaults in a nasty way. I’m
having a hard time finding the technical details of how threading in
Ruby, particularly 1.9.2-p0, works. Any pointers?

Thanks,
Alex

Nevermind… figured it out.

Though I still wonder if there’s any highly-detailed technical
documentation on ruby threading? (Or is the interpreter source
considered it?)

-Alex

On 28.09.2010 04:35, Alex S. wrote:

Nevermind… figured it out.

Mind to share?

thanks,

  • Markus

No prob… but I’m not sure it’s quite what you’re looking for. I’m
refactoring code I can’t get to work instead of finding the root cause
in the threading. (BTW - any thread insight would be appreciated based
on this write-up - cuz there’s still problems! I’m starting to consider
that a Threaded Load Tester Gem might be handy…)

After staring at the screen for too long, I took a break to ponder
whether the organization of my threads was fundamentally flawed. It
occurred to me that one of the problems I was having - a deadlock -
could be avoided by instantiating the threads only when needed.

If you review my original code snippet, I spin up all the threads w/ a
proc object in initialize(), then .stop them, then perform via method
a .call on each element of the thread’s api array, then a .wakeup
followed by the .join.


1.upto @count do
@threads << Thread.new do
Thread.current[:hi5] = Hi5fbapi.new ‘redacted params’
Thread.current[“api_calls”] = []
apis.each do |api|
Thread.current[“api_calls”] << get_call(api) #pushes a proc obj
end
Thread.stop
end
end

@threads.each do |thr|
thr[:api_calls].each do |api|
p api.call thr[:hi5], @session
end
end
@threads.each {|t| t.wakeup.join}

On 1.8.7 w/ a single core processor, this is a highly deterministic
sequence, and would not deadlock. Once deployed to a multi-core VM
running 1.9.2-p0 (selected specifically for concurrency), not so much.

There I encountered more deadlocks and also "NoMethodError"s from
Sinatra (undefined method `bytesize’ for #<Thread:0xa36ae0c dead>).
This would occur during the .each where I would .join. So it’s trying
to join a ‘dead’ thread. Except that if I add anything to prevent that,
such as

… unless t.status == ‘dead’…

it would still deadlock or NoMethodError.

But further testing showed that adding any operation to the main thread,
prior to calling .join, would prevent the deadlock:


@threads.each do |thr|
p thr.inspect

Anyway, I rewrote things where the threads are created in the method,
not initialize(), so ‘@api_calls’ is already populated by procs, and it
works fine at small scale:


thr = []
1.upto @count do
thr << Thread.new do
@api_calls.each do |api|
p api.call @hi5, @session
Thread.pass
end
end
end
thr.each {|t| t.join}

This runs fine on 1.8.7/single and 1.9.2/multi at like 5-10 threads.
But when I ramp up to, say, 5,000 (it is a load test!), 1.8.7 is fine
but 1.9.2 segfaults.

Even 500 threads on 1.9.2 is segfaulting right now (but not 1.8.7). I
get the handy output:

[NOTE]
You may have encountered a bug in the Ruby interpreter or extension
libraries.
Bug reports are welcome.
For details: http://www.ruby-lang.org/bugreport.html

I’ll write it up tomorrow.

Some open questions:

  1. When .join is called on a multi-core system, what qualifies as the
    calling thread? The .main on the main processor, or the thread which
    instantiates my ThreadedLoadTester object? (i.e. what if
    ThreadedLoadTester is created from a sinatra thread which itself isn’t
    main?)

  2. Sinatra regularly reports a ‘NoMethodError’ for ‘bytesize’ when the
    last thread is dead but joined to the main thread. But only when the
    main thread originates w/in sinatra, and not an inline call.

  3. Is there a theoretical maximum to the number of concurrent threads
    which can be created which all access a network interface? This is
    admittedly a poor theory - what might really cause a segfault in 1.9.2
    when 500 threads all try to access the network?

Thanks for asking :slight_smile:
-Alex

Alex S. [email protected] wrote:

  1. Is there a theoretical maximum to the number of concurrent threads
    which can be created which all access a network interface? This is
    admittedly a poor theory - what might really cause a segfault in 1.9.2
    when 500 threads all try to access the network?

Are you opening multiple file descriptors per thread? If you exceed
1024 file descriptors per-process, then the select() interface will
overrun the buffers and segfault. You can fork() your process to get
around this (and get more CPU/memory concurrency), or do all your IO
over something like Rev[1] or EventMachine[2] which use epoll or kqueue.

Which OS is this on? There may be some lingering pthreads portability
issues for non-NPTL. Definitely talk to ruby-core about this.

[1] - http://rev.rubyforge.org/
[2] - http://rubyeventmachine.com/

On 28.09.2010 11:16, Alex S. wrote:

 apis.each do |api|

end
to join a ‘dead’ thread. Except that if I add anything to prevent that,
@threads.each do |thr|
thr<< Thread.new do
But when I ramp up to, say, 5,000 (it is a load test!), 1.8.7 is fine

last thread is dead but joined to the main thread. But only when the
main thread originates w/in sinatra, and not an inline call.

  1. Is there a theoretical maximum to the number of concurrent threads
    which can be created which all access a network interface? This is
    admittedly a poor theory - what might really cause a segfault in 1.9.2
    when 500 threads all try to access the network?

Without going into too much detail I believe one flaw of your design
here is that you are not using thread synchronization but instead try to
explicitly start and stop threads and yield execution. It may be that
this is causing your cores, but I really don’t know.

What I would do:

  1. Use a condition variable to let all threads start at the same time.

  2. use Thread#value to collect results.

require ‘thread’

lock = Mutex.new
cond = ConditionVariable.new
start = false

threads = (1…10).map do
Thread.new do
lock.synchronize do
until start
cond.wait(lock)
end
end

 # work
 # return results
 [rand(10), rand(100)]

end
end

lock.synchronize do
start = true
cond.signal
end

threads.each do |th|
p th.value
end

You can probably get away without the condition variable by just
acquiring the lock (lock.synchronize) in the main thread before you
create all threads and let all threads just synchronize with an empty
block.

Kind regards

robert

Thanks for the note, Eric. Each thread is only opening one file
descriptor, so I haven’t encountered a limit there. And a refactor
eliminated the segfaults.

Although, after further rewriting, I’m now bumping into a maximum of
~2,950 threads when running on 1.9.2-p0 (on Ubuntu 10.04). When I run
w/ >2,950, there’s one of three errors I encounter:

-can’t create Thread (11) (ThreadError) (from my class)
-out of memory error (from my class; sorry, missed a chance to copy)
-Cannot assign requested address - connect(2) (Errno::EADDRNOTAVAIL)
(from HTTP class)

Changed up the class to be a little more concise:

def run_threads()
thr = []
1.upto @count do
thr << Thread.new do
@tests.each do |test|
p test.call @obj, @session
Thread.pass
end
end
end
thr.each {|t| t.join}
end

So what limits might I be bumping into now? The "can’t create…’ seems
to be the most common error - what could cause that? Is my best course
of action to d/l source, grep for that string, and analyze from there?
Or is this possibly a bug? Or am I beyond expected threading usage??

Thanks,
Alex

On 9/29/10, Robert K. [email protected] wrote:

Without going into too much detail I believe one flaw of your design
here is that you are not using thread synchronization but instead try to
explicitly start and stop threads and yield execution. It may be that
this is causing your cores, but I really don’t know.

I agree that in general it’s better to do this kind of thing using the
appropriate synchronization data structures… however, I should
expect that it is not possible to crash the ruby interpreter purely by
writing ruby code, regardless of the presence of bugs in it. I don’t
know whether it is possible to actually achieve this level of
reliability when dealing with threading code that contains race
conditions, tho.

What I would do:

  1. Use a condition variable to let all threads start at the same time.
    [snip]
    lock.synchronize do
    start = true
    cond.signal
    end

Putting aside my prejudices against ConditionVariable, there is
another problem with this: ConditionVariable#signal awakens only one
thread waiting on the condvar. You’d want to use
ConditionVariable#broadcast instead. But even then, there is a race
condition; you’re not guaranteed that all the threads have blocked
waiting on the condvar. Some may still be running the code before that
point, and they would end up never running to completion. A counting
semaphore could solve that, but ruby doesn’t actually have one of
those (sigh).

Thread synchronization is a real PITA.

I see now that the first line of ConditionVariable#broadcast is this:

TODO: imcomplete

So, maybe there’s some kind of problem with it?

You can probably get away without the condition variable by just
acquiring the lock (lock.synchronize) in the main thread before you
create all threads and let all threads just synchronize with an empty block.

I can’t see any holes in this scheme, so it’s probably the best idea.

On 10/3/10, Robert K. [email protected] wrote:

On 03.10.2010 02:38, Caleb C. wrote:

But even then, there is a race
condition; you’re not guaranteed that all the threads have blocked
waiting on the condvar. Some may still be running the code before that
point, and they would end up never running to completion.

This is not true. The only negative thing that would happen is that
they would not start doing their work at the “same” time as the other
threads. Other than that they would do their work as the other threads.

Oh, you are right. The boolean variable start prevents the case I was
worried about. This is why I find condvars confusing.

You can probably get away without the condition variable by just
acquiring the lock (lock.synchronize) in the main thread before you
create all threads and let all threads just synchronize with an empty
block.

I can’t see any holes in this scheme, so it’s probably the best idea.

Even then it could be that some threads execute code before the
synchronize and thus not start concurrently with other threads.

But you made the synchronize statement at the very start of the thread
body, so this would seem to not be a concern in this case.

On 03.10.2010 20:30, Caleb C. wrote:

On 10/3/10, Robert K.[email protected] wrote:

On 03.10.2010 02:38, Caleb C. wrote:

But you made the synchronize statement at the very start of the thread
body, so this would seem to not be a concern in this case.

The access to the cond var in the synchronize block was also the first
statement in the thread body:

Thread.new do
lock.synchronize do
until start
cond.wait(lock)
end
end

 # work
 # return results
 [rand(10), rand(100)]

end

So there is really not that much difference. In practice this will
usually not be a problem but from a more formal perspective it does not
matter how many operations are performed before the synchronization - it
still may be that a thread does not get CPU to get there. That’s why I
said that for a test scenario I would only bother to have all threads
started concurrently if there was a lot of ramp up work to do.

Kind regards

robert

On 03.10.2010 02:38, Caleb C. wrote:

know whether it is possible to actually achieve this level of
reliability when dealing with threading code that contains race
conditions, tho.

Absolutely agree. But since 1.9.2 is pretty fresh I’d expect the more
traditional code (proper thread sync) to be less likely to crash than
obscure variants.

another problem with this: ConditionVariable#signal awakens only one
thread waiting on the condvar. You’d want to use
ConditionVariable#broadcast instead.

Right, sorry for mixing this up.

But even then, there is a race
condition; you’re not guaranteed that all the threads have blocked
waiting on the condvar. Some may still be running the code before that
point, and they would end up never running to completion.

This is not true. The only negative thing that would happen is that
they would not start doing their work at the “same” time as the other
threads. Other than that they would do their work as the other threads.

A counting
semaphore could solve that, but ruby doesn’t actually have one of
those (sigh).

Thread synchronization is a real PITA.

Well, your answer kind of confirms this. :slight_smile:

I see now that the first line of ConditionVariable#broadcast is this:

TODO: imcomplete

So, maybe there’s some kind of problem with it?

You can probably get away without the condition variable by just
acquiring the lock (lock.synchronize) in the main thread before you
create all threads and let all threads just synchronize with an empty block.

I can’t see any holes in this scheme, so it’s probably the best idea.

Even then it could be that some threads execute code before the
synchronize and thus not start concurrently with other threads.
Frankly, for a test scenario I would not bother to try to let threads
start really concurrently. Unless there is huge preparation overhead I
would simply create those threads and let them do their work. There is
no guarantee anyway that they can work in parallel because in a non
realtime OS there are no guarantees as to when the scheduler decides to
give CPU to threads.

Kind regards

robert