Thread safety techniques for server applications?

Hey all,

I’m looking for some information about handling thread safety with Ruby.
I’ve got an application server I wrote that I need to make sure it’s
thread safe. This application server is used over http requests so it’s
possible multiple people hit it at once. I have some questions that will
help me determine…

  1. Does using mongrel / lighttpd / webrick, ensure thread saftey? (my
    application relies on these)
  2. What kinds of things in the Ruby language should I NOT do that will
    cause thread headaches… (maybe static variables)?
  3. What techniques can I use to go about testing thread saftey?

I’m not asking anything in reference to rails. This would be just
general Ruby thread safety ideas…

thanks…

Hi Aaron,
I’d like to learn more in this area too, but here are my thoughts:
The web servers, at least mongrel, are single-threaded. Mongrel queues
requests and feeds them to the app sequentially. To get concurrency
you have to run multiple instances of mongrel. In this situation there
are no thread safety issues because there’s only one thread per
process.
I like the idea of separate processes instead of worrying about thread
safety, but sometimes I need multiple threads, for example in a jabber
client (keepalives, listeners, etc). What I’ve been doing is keeping
it as simple as possible and so far I haven’t had to think about
thread conflicts. Or maybe I should be but I haven’t :wink:
–Dave

On Aug 25, 2007, at 12:05 AM, [email protected] wrote:

Hi Aaron,
I’d like to learn more in this area too, but here are my thoughts:
The web servers, at least mongrel, are single-threaded. Mongrel queues
requests and feeds them to the app sequentially. To get concurrency
you have to run multiple instances of mongrel. In this situation there
are no thread safety issues because there’s only one thread per
process.

I don’t believe this is true of mongrel itself, but rather of the
Rails handler in mongrel.

Corey

On Sat, 25 Aug 2007, [email protected] wrote:

I’d like to learn more in this area too, but here are my thoughts:
The web servers, at least mongrel, are single-threaded. Mongrel queues
requests and feeds them to the app sequentially. To get concurrency
you have to run multiple instances of mongrel. In this situation there
are no thread safety issues because there’s only one thread per
process.

This is untrue.

The standard Mongrel is threaded. It creates a new thread of execution
for each connection that it receives, and those execute in parallel with
each other and the main Mongrel thread, which is essentially just an
accept() loop that receives the requests and spawns handler threads for
them.

The Rails mongrel handler has a mutex that locks the action within it to
a
single thread of execution at a time. So, if 10 requests come in at the
same time, Mongrel will create 10 threads of execution for those 10
requests, but when execution flow reaches the Rails handler, each thread
will stand in line at the mutex gate and proceed through it in single
file.

In a standard Mongrel handler, which does not have a mutex at the front
of
it, the requests are processed concurrently. This is the normal
situation.

Remember that Ruby threads, being green threads, are all in the same
process, so there is no actual concurrency of execution between them.
In
most cases these threads will not increase your throughput.

Kirk H.

On Sat, 25 Aug 2007, Aaron S. wrote:

I’m looking for some information about handling thread safety with Ruby.
I’ve got an application server I wrote that I need to make sure it’s
thread safe. This application server is used over http requests so it’s
possible multiple people hit it at once. I have some questions that will
help me determine…

In general, it’s the same as any other type of threaded programming.
Share as little as possible, and control access to shared resources so
that two threads aren’t changing state in it at the same time and
running
into eachother. Look at the Mutex class and the Queue class as starting
points for tools to help you do this.

  1. Does using mongrel / lighttpd / webrick, ensure thread saftey? (my
    application relies on these)

lighttpd is an external web server, so it’s irrelevant.

Both mongrel and webrick are threaded Ruby web server platforms. They,
however, don’t do anything to ensure that your code which you run inside
of them is threadsafe.

  1. What kinds of things in the Ruby language should I NOT do that will
    cause thread headaches… (maybe static variables)?

The only things to really keep in mind is that Ruby threads are green
threads – they are all done inside of the Ruby interpreter. So, they
all
share a single process. Thus, the use of threads will rarely increase
the
throughput of your program, unless there is some external latency that
can
be captured, and that external latency does not occur inside of a Ruby
extension.

This is because while the flow of execution is inside of an extension,
it
is out of Ruby’s control, and no thread task switching will take place.

Also, be aware that Ruby uses a select() loop to manage its threads of
execution, and it has an fd_setsize limit of 1024 handles, so there is a
sharp upper boundary on the number of threads you can have in a Ruby
process.

  1. What techniques can I use to go about testing thread saftey?

Look for areas in your code where you share resources between your
threads. Do you take precautions to keep multiple threads from stepping
on eachother when using those resources?

Write test code that creates multiple threads, and tries to stress those
areas.

Kirk H.

Remember that Ruby threads, being green threads, are all in the same
process, so there is no actual concurrency of execution between them. In
most cases these threads will not increase your throughput.

Kirk H.

I’ve heard that Ruby 2.0 won’t use Green Threads, so hopefully this will
change. I would assume the reason why Ruby uses Green Threads is to try
and
maintain portability over efficientcy.

I should play with it, I’ve never tried Ruby for Multi-Threaded work. It
Might
not be worth my time but learning some thing new would be fun.

TerryP.

On Aug 25, 2007, at 9:29 AM, Terry P. wrote:

I’ve heard that Ruby 2.0 won’t use Green Threads, so hopefully this
will
change. I would assume the reason why Ruby uses Green Threads is to
try and
maintain portability over efficientcy.

it’s interesting to me that people assume green threads provide less
performance advantage that native threads. this is patently untrue:
it all depends on your task! to summarize

  • if your task is cpu intensive AND you are on an SMP box AND you use
    many threads (aka lightweight processes) you will see a speed boost

  • if your task is io/network bound and/or you are spawning a TON of
    threads then green threads will provide a speedup on any decent (aka
    not windoze) platform

consider these facts

  • green threads are inexpensive to create compared to native threads

  • green threads can help throughput a lot where io is concerened iff
    select is a good paradigm for scheduling activity (imagine many
    network connections)

  • native threads are relatively expensive to create

  • native threads have the same bottleneck on io that green threads
    have: you can physically only write to disk with the number of disk
    controllers you have and reading from sockets may still be limited to
    the speed of the person on the other end

green threads are good for something and native threads are good for
somethings. fortunately in ruby it’s extremely easy to farm out
tasks to another process and use ipc with

slave = Slave.new{ Server.new }

so we get the best of both works if we want it. people will miss
green threads when they are gone - am i the only one who remembers
not being able to stop java’s native threads?

cheers.

a @ http://drawohara.com/

ara.t.howard wrote:

controllers you have and reading from sockets may still be limited to
able to stop java’s native threads?
Well … I think we should have both green threads (i.e., a built-in
thread scheduler in a single Ruby process) and native threads (i.e., the
Linux “clone” operation creating a separate lightweight process sharing
a memory space). On top of that, we should have Erlang-style lightweight
Ruby processes communicating via message passing, something resembling
MPI and probably something resembling OpenMP. And of course there’s dRB
and Rinda – they aren’t going away, are they?

Unfortunately, the whole world doesn’t use the Linux kernel and the GCC
compilers, but for the part of the world that does, all of this is
doable via C-language libraries, and I’m guessing most of it has been
done. I know there’s a “ruby-mpi” project, for example, although it
looks like it hasn’t been touched in a couple of years and might be
orphaned.

[email protected] wrote:

On Sun, 26 Aug 2007, ara.t.howard wrote:

  • if your task is cpu intensive AND you are on an SMP box AND you use
    many threads (aka lightweight processes) you will see a speed boost

How do you figure that? On a CPU intensive task, the threading overhead
just removes from the amount of time that the CPU has to work on the CPU
intensive task.

On second reading, I think what he was saying was one native thread
per cpu.

On Sun, 26 Aug 2007, ara.t.howard wrote:

I’ve heard that Ruby 2.0 won’t use Green Threads, so hopefully this will
change. I would assume the reason why Ruby uses Green Threads is to try and
maintain portability over efficientcy.

it’s interesting to me that people assume green threads provide less
performance advantage that native threads. this is patently untrue: it all
depends on your task! to summarize

  • if your task is cpu intensive AND you are on an SMP box AND you use many
    threads (aka lightweight processes) you will see a speed boost

How do you figure that? On a CPU intensive task, the threading overhead
just removes from the amount of time that the CPU has to work on the CPU
intensive task.

  • if your task is io/network bound and/or you are spawning a TON of threads
    then green threads will provide a speedup on any decent (aka not windoze)
    platform

That’s what I was saying. You have to have external latencies that you
can capture without spending time blocking inside of an extension, and
they have to represent a significant enough slice of the code’s
activities
to overcome the overhead of thread creation, thread switching, and the
additional cost of all of the extra objects that each thread and its
contents imposes on the garbage collector.

So I still maintain that most of the time, for most code, Ruby threads
are
not a vehicle for improved performance, and are best through of as a
usefull tool for elegantly solving problems instead of as a tool to make
stuff go faster.

Kirk H.

M. Edward (Ed) Borasky wrote:

Well … I think we should have both green threads (i.e., a built-in
thread scheduler in a single Ruby process) and native threads (i.e., the
Linux “clone” operation creating a separate lightweight process sharing
a memory space).

Did the recent discussion of fibers lead to the conclusion that green
threads would still exist in some form in 1.9? Will we be able to
experiment with 1:N and M:N threading in pure ruby?

On Aug 25, 2007, at 1:30 PM, Joel VanderWerf wrote:

On second reading, I think what he was saying was one native
thread per cpu.

yes. the whole point of SMP is that you can scale up that way. thanks.

a @ http://drawohara.com/

On Sun, 26 Aug 2007, ara.t.howard wrote:

On Aug 25, 2007, at 1:30 PM, Joel VanderWerf wrote:

On second reading, I think what he was saying was one native thread per
cpu.

yes. the whole point of SMP is that you can scale up that way. thanks.

Ok. It confused me because we were talking about the current Ruby
threading, which of course isn’t helped by SMP.

Kirk H.

On Aug 25, 2007, at 1:36 PM, Joel VanderWerf wrote:

Did the recent discussion of fibers lead to the conclusion that
green threads would still exist in some form in 1.9? Will we be
able to experiment with 1:N and M:N threading in pure ruby?

i sure hope so - no point throwing out the baby with the bath water…

cheers.

a @ http://drawohara.com/

So my weak brain is trying to put this all together. Here’s what I make
of it and how it answers my questions. Correct me if I’m wrong.

The Rails mongrel handler has a mutex that locks the action within it to
a single thread of execution at a time. So, if 10 requests come in at the
same time, Mongrel will create 10 threads of execution for those 10
requests, but when execution flow reaches the Rails handler, each thread
will stand in line at the mutex gate and proceed through it in single
file."

When relying on Rails, the rails handler has a mutex in it so that only
one thread will ever go through the rails handler until it’s complete
and then let’s the next thread through for processing? So that would
tell me that if my app is relying on Rails, that I don’t need to worry
about access to shared variables, such as static variables? As it isn’t
truly executing multiple threads at once.

That does lead me to another question if using multiple Mongrel
processes. Multiple processes allow mongrel to receive more requests
thus creating more threads, but the Rails gateway is still opening the
gate for one thread at a time?

Also all this discussion makes me think of another question. I’m using a
dispatcher in my application. found
here(http://derrick.pallas.us/ruby-cgi/). For someone who’s reading the
code and knows a lot about mutex, are there any things that stand out to
someone as being a bad solution with this dispatcher? keeping in mind
multiple requests at once and possible shared resource headaches? I’m
just trying to get a feel for what I need to look into to make this
dispatcher better and avoid shared resource collisions.

On 8/25/07, Joel VanderWerf [email protected] wrote:


vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

There’s nothing wrong with experimentation, but there are good reasons
why
the Linux kernel people moved away from the M:N model. The S. threading
model is still M:N because it has been for a dozen years, but they
changed
the default threading discipline to “pre-emptive” years ago because life
is
just so much easier that way. From a programmer’s perspective, Solaris
threads might as well be 1:N. (More precisely, it’s an exceedingly rare
program that can benefit from direct dependence on the M:N model.)

Non-preemptive threads seem like a wonderful idea because they’re so
lightweight. Having done it both ways, I can tell you that the big
problem
with non-preemptive threads is the bugs you get when they don’t get
scheduled at the right times. I think something like Erlang processes
will
be far easier to work with, and they’re every bit as lightweight.

I’m looking for some information about handling thread safety with Ruby.

In general do not share variables between threads (or at least
synchronize when you access them), do not interrupt threads by injecting
interrupts into them (well, you can but…it’s dangerous),
add

Thread.abort_on_exception = true # if a thread dies, tell me :slight_smile:

to your code so that you can debug when exception are thrown but not
caught…

Also load up all your functions/everything ‘single threaded’ otherwise
some of the functions will be ‘assigned’ to the wrong thread (ugh), then
be unavailable (classes, modules, too).
I.e. dynamically declared functions are dangerous.

Note also that sometimes if you are reading from TCPSockets the sockets
get confused and start reading from one another. To avoid this (I
think) use Francis C.'s EventMachine.

That being said it is still fun to program multithreaded stuff in Ruby,
despite the growing pains and fact that Ruby isn’t quite there yet.

-Roger

Aaron S. wrote:

Hey all,

I’ve got an application server I wrote that I need to make sure it’s
thread safe. This application server is used over http requests so it’s
possible multiple people hit it at once. I have some questions that will
help me determine…

  1. Does using mongrel / lighttpd / webrick, ensure thread saftey? (my
    application relies on these)
  2. What kinds of things in the Ruby language should I NOT do that will
    cause thread headaches… (maybe static variables)?
  3. What techniques can I use to go about testing thread saftey?

I’m not asking anything in reference to rails. This would be just
general Ruby thread safety ideas…

thanks…

Roger P. wrote:

Also load up all your functions/everything ‘single threaded’ otherwise
some of the functions will be ‘assigned’ to the wrong thread (ugh), then
be unavailable (classes, modules, too).
I.e. dynamically declared functions are dangerous.

In ruby, everything is dynamically defined. The closest you can get to
static definitions is to require all your lib files before starting
threads. Even so, it’s not really static. It might be safer in some
cases because require-ing a file is not atomic.(*) That’s a corner case,
though.

I don’t think it’s possible for things to be unavailable because they
were loaded in the wrong thread, though. Got an example?

Note also that sometimes if you are reading from TCPSockets the sockets
get confused and start reading from one another.

Really? I’ve never seen that, even with pretty heavy use of lots of
threads and sockets.


(*) An example:

[~/tmp] cat a.rb

t = Thread.new do
loop do
sleep 0.1
puts “a”
end
end

sleep 0.5

require ‘b’

[~/tmp] cat b.rb
t = Thread.new do
loop do
sleep 0.1
puts " b"
end
end

sleep 1
t.kill

[~/tmp] ruby a.rb
a
a
a
a
a
b
a
b
a
b
a
b
a
b
a
b
a
b
a
b
a
b
a

Thanks for your comments.
Forgive me if I called modules/classes statically loaded. What I meant
was ‘loaded before you start splitting to multiple threads’ instead of
static.

For example I have used xmlrpc before (which relies on REXML) and every
so often (note the ‘freak chance’ aspect) it will throw the exception
“REXML::Document not found” despite the fact that it indeed should be,
and normally is. I laid the blame on Ruby threads. Where it belongs, I
think is on…Ruby threads. But having not actually ever fixed it, I
can’t say for sure. I believe I’ve been able to recreate it reliably.

Joel VanderWerf wrote:

Roger P. wrote:

Also load up all your functions/everything ‘single threaded’ otherwise
some of the functions will be ‘assigned’ to the wrong thread (ugh), then
be unavailable (classes, modules, too).
I.e. dynamically declared functions are dangerous.

In ruby, everything is dynamically defined. The closest you can get to
static definitions is to require all your lib files before starting
threads. Even so, it’s not really static. It might be safer in some
cases because require-ing a file is not atomic.(*) That’s a corner case,
though.

I don’t think it’s possible for things to be unavailable because they
were loaded in the wrong thread, though. Got an example?

Note also that sometimes if you are reading from TCPSockets the sockets
get confused and start reading from one another.

Really? I’ve never seen that, even with pretty heavy use of lots of
threads and sockets.

Yeah I get it…sometimes a socket that is only used for sending will
magically ‘receive’…its own output! Wow! And other weirdness. Mostly
on slower machines. It is odd. I noticed Zed S. said he’d run into
the same thing (and was unable to track down the cause) in some thread
or other here. I honestly don’t get it, either, but I think it’s half
the motivation to the creation of EventMachine.

Just my own $0.02
-Roger


(*) An example:

[~/tmp] cat a.rb

t = Thread.new do
loop do
sleep 0.1
puts “a”
end
end

sleep 0.5

require ‘b’

[~/tmp] cat b.rb
t = Thread.new do
loop do
sleep 0.1
puts " b"
end
end

sleep 1
t.kill

[~/tmp] ruby a.rb
a
a
a
a
a
b
a
b
a
b
a
b
a
b
a
b
a
b
a
b
a
b
a

On 8/28/07, Roger P. [email protected] wrote:

Yeah I get it…sometimes a socket that is only used for sending will
magically ‘receive’…its own output! Wow! And other weirdness. Mostly
on slower machines. It is odd. I noticed Zed S. said he’d run into
the same thing (and was unable to track down the cause) in some thread
or other here. I honestly don’t get it, either, but I think it’s half
the motivation to the creation of EventMachine.

Just my own $0.02

By way of adding 0.02 cents of my own: EventMachine was motivated by the
desire to get extremely high performance and scalability (meaning, large
numbers of simultaneously connected sockets). (What I was really
thinking
of, in addition to enterprise-caliber network servers, was high-speed
interprocess messaging.) And of course it’s accepted by many people that
if
you’re going to get high performance and scalability together, you have
to
get rid of threads. Hence the event-driven approach. The tradeoff is
that
programs are somewhat trickier to write, but a large number of
EventMachine
protocol handlers mitigate that problem.

In Linux, EM can easily handle tens of thousands of connections in a
single
process without noticeable performance degradation or high memory
consumption.