Ruby 1.9.x Concurrency


#1

Poking through the Apple press releases today, I sat up and took
notice when I saw that they were putting a fair amount of pretty
public emphasis on concurrency as the silver bullet for faster
computing when Snow Leopard comes out. If we stipulate that
concurrency is fundamentally a good solution of a certain class of
problems, here were the questions I immediately had:

  • My understanding is that the 1.9 implementation of threads is to use
    native threads. But the caveat is that the GIL is still in place. What
    does this mean in practice as it applies to increasing throughput by
    distributing load across processor cores?

  • I’m trying to parse the fiber vs. thread distinction and it feels to
    me like fibers are a leaner, meaner version of the 1.8.x green
    threads, but that they will always run on the same core. Am I missing
    something here?

Thanks,

Steve


#2

Steve,

I found this post quite useful in filling in the missing pieces for
me, may be helpful if you have the same missing piece :slight_smile:

http://www.igvita.com/2009/05/13/fibers-cooperative-scheduling-in-ruby/

Regards,

GLenn


#3

On Jun 9, 2009, at 1:23 AM, s.ross wrote:

throughput by distributing load across processor cores?

  • I’m trying to parse the fiber vs. thread distinction and it feels
    to me like fibers are a leaner, meaner version of the 1.8.x green
    threads, but that they will always run on the same core. Am I
    missing something here?

What I think you’re getting at here is, yes, threading still isn’t the
way to get real concurrency on Ruby 1.9. If you really want to do two
things at once, you’re going to need processes in Ruby.

James Edward G. II


#4

On 9 Jun 2009, at 15:16, James G. wrote:

place. What does this mean in practice as it applies to increasing
throughput by distributing load across processor cores?

  • I’m trying to parse the fiber vs. thread distinction and it feels
    to me like fibers are a leaner, meaner version of the 1.8.x green
    threads, but that they will always run on the same core. Am I
    missing something here?

What I think you’re getting at here is, yes, threading still isn’t
the way to get real concurrency on Ruby 1.9. If you really want to
do two things at once, you’re going to need processes in Ruby.

Indeed. And on a Unix box (like Snow Leopard) there really isn’t a
good excuse for ignoring them unless your concurrency needs are pretty
trivial. Use threads for simple things like downloading multiple web
pages concurrently, but for significant processing jobs chuck your
data down a pipe to a process and let the OS take care of proper
scheduling etc.

And for some quick coverage of this sort of thing grab the “Ruby
Plumber’s Guide” presentation from the link in my sig, or wait for
James’s RubyKaigi presentation which I suspect will be both more
coherent and more detailed :slight_smile:

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

raise ArgumentError unless @reality.responds_to? :reason


#5

On 9 Jun 2009, at 17:06, Charles Oliver N. wrote:

On Tue, Jun 9, 2009 at 9:16 AM, James
Grayremoved_email_address@domain.invalid wrote:

What I think you’re getting at here is, yes, threading still isn’t
the way
to get real concurrency on Ruby 1.9. If you really want to do two
things at
once, you’re going to need processes in Ruby.

Or just use JRuby, and real concurrent/parallel threads will just work
out of the box :slight_smile:

And on Windows too. Damn smug JVM users ;p

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

raise ArgumentError unless @reality.responds_to? :reason


#6

On Jun 9, 2009, at 10:07 AM, Eleanor McHugh wrote:

And for some quick coverage of this sort of thing grab the “Ruby
Plumber’s Guide” presentation from the link in my sig, or wait for
James’s RubyKaigi presentation which I suspect will be both more
coherent and more detailed :slight_smile:

No pressure! :slight_smile:

Seriously, I doubt I be as detailed. I’m going to show some basics
and a few techniques that have worked for me.

I’m giving the same talk as a preview to OK.rb this Thursday, so come
see us if your near OKC.

James Edward G. II


#7

On Tue, Jun 9, 2009 at 10:06 AM, Charles Oliver N.
removed_email_address@domain.invalidwrote:

Or just use JRuby, and real concurrent/parallel threads will just work
out of the box :slight_smile:

Well, as best they can on Ruby. You guys have done some really great
work
on that, but Ruby’s approach to threading is rather poor.


#8

Thanks for all the answers. I guess the main issue I’m trying to get
my head around is that since Moore’s law doesn’t quite seem to be
keeping up with the demand for processing power, multi-code and multi-
processor hardware solutions are predominant in the field now. The
question I’m trying to answer is not how great the threading solution
is but rather whether the threading solution solves the problem of
distributing workload without incurring more overhead than it’s worth.

I implemented a simple MacRuby app that just fills a list from a
database. Doing the database query and populating the 20 or so items
that show at any given time still gave the app a kind of sluggish
feel. Once I separated this into a separate thread, removing the
block, the simple fact that the UI was alive made the app seem
“perkier.”

More to the point, I created an app using MRI that relied on
downloading a boatload of information from a Web service. Single
threaded, this took about 20 minutes, where using multiple threads, it
was accomplished in 3-5 minutes. However, this one involves a good
deal of trickery so as not to step on buffers in the net/http
libraries (or something underlying).

So I come back to the question: As we find ourselves with resources
that scale across processing units, how best does Ruby solve the
problem and what role do Fibers play in that solution, if any?

Thanks again,

Steve


#9

On Tue, Jun 9, 2009 at 9:16 AM, James G.removed_email_address@domain.invalid
wrote:

What I think you’re getting at here is, yes, threading still isn’t the way
to get real concurrency on Ruby 1.9. Â If you really want to do two things at
once, you’re going to need processes in Ruby.

Or just use JRuby, and real concurrent/parallel threads will just work
out of the box :slight_smile:

  • Charlie

#10

On Tue, Jun 9, 2009 at 12:11 PM, s.ross removed_email_address@domain.invalid wrote:

More to the point, I created an app using MRI that relied on downloading a
boatload of information from a Web service. Single threaded, this took about
20 minutes, where using multiple threads, it was accomplished in 3-5
minutes. However, this one involves a good deal of trickery so as not to
step on buffers in the net/http libraries (or something underlying).

So I come back to the question: As we find ourselves with resources that
scale across processing units, how best does Ruby solve the problem and what
role do Fibers play in that solution, if any?

Having written what’s probably the fastest concurrent HTTP fetcher
available
in Ruby, here’s a bit on how it worked in practice:

We set the system up to allow N HTTP fetching “agents”, each of which
would
attach to a message queue and indicate their availability for accepting
jobs. Want it to go faster? Just make N bigger.

A command and control process would then pick and idle fetcher agent and
send it a batch of URLs to fetch.

It used a lightweight concurrency library I wrote called Revactor which
is
based around Fibers. Each fetcher process used 64 Fibers which would
pull
from the URL buffer in a round robin fashion. If you’re curious how
this
works, the core logic for this process is distributed as part of
Revactor’s
standard library:

http://github.com/tarcieri/revactor/blob/master/lib/revactor/http_fetcher.rb

We ran one of these fetcher processes per CPU core of the systems we
were
running them on. They were rather CPU intensive as they did a lot of
regex
processing on the fetched documents. That said, it didn’t take much: we
were able to suck in 30 megabits of data at once using just four
processes
running on a single quad core system.


#11

On Jun 9, 2009, at 1:11 PM, s.ross wrote:

The question I’m trying to answer is not how great the threading
solution is but rather whether the threading solution solves the
problem of distributing workload without incurring more overhead
than it’s worth.

While threading does solve that problem in some languages, it not
really for that in Ruby. In Ruby, threading is for separating off
action that will need to wait at some point (generally on I/O
operation) so you can keep working on other things while they do.

I implemented a simple MacRuby app that just fills a list from a
database. Doing the database query and populating the 20 or so items
that show at any given time still gave the app a kind of sluggish
feel. Once I separated this into a separate thread, removing the
block, the simple fact that the UI was alive made the app seem
“perkier.”

Sure. Your Thread paused waiting for the database I/O, but the rest
of the application kept moving. That’s a good example of where Ruby’s
threads help.

More to the point, I created an app using MRI that relied on
downloading a boatload of information from a Web service. Single
threaded, this took about 20 minutes, where using multiple threads,
it was accomplished in 3-5 minutes. However, this one involves a
good deal of trickery so as not to step on buffers in the net/http
libraries (or something underlying).

Again, at each Thread hit a waiting period, others had a chance to
run. When it was single threaded, you had to serially wait through
each pause.

The important thing to realize about all of the above is that they
didn’t go faster because you were suddenly doing more than one thing
at a time. MRI doesn’t do that. You just arranged to spend less time
waiting. That’s nice, but it’s not true concurrency.

So I come back to the question: As we find ourselves with resources
that scale across processing units, how best does Ruby solve the
problem and what role do Fibers play in that solution, if any?

When you really want to do two things at once with Ruby, you want more
processes. fork() is your friend. :slight_smile:

James Edward G. II


#12

On Jun 9, 2009, at 1:34 PM, Tony A. wrote:

It used a lightweight concurrency library I wrote called Revactor
which is based around Fibers. Each fetcher process used 64 Fibers
which would pull from the URL buffer in a round robin fashion.

I’ve very much a Fiber newbie, so forgive my dumb questions, but…

Fibers don’t really give true concurrency either, right? I’m not
understanding how creating 64 of them speeds things up.

James Edward G. II


#13

On Tue, Jun 9, 2009 at 2:31 PM, James G.
removed_email_address@domain.invalidwrote:

I’ve very much a Fiber newbie, so forgive my dumb questions, but…

Fibers don’t really give true concurrency either, right?

Correct. Fibers are coroutines which can cooperatively switch between
each
other. However, they can provide an excellent mechanism for modeling
concurrent I/O where your concurrency primitives spend most of their
time
sleeping waiting for I/O events to happen, which is what’s being done
here.

Revactor is a library which provides an Erlang-like actor model which
uses
fibers as the underlying concurrency primitive (although in Erlang and
in
MenTaLguY’s thread-based actor library Omnibus actors are actually
pre-emptive)

The real advantage of this approach is a synchronous facade on top of
what
is underneath a fully asynchronous event system. Revactor is built on
top
or Rev, which is an asynchronous event library that uses libev to do
event
handling and I/O multiplexing.

If you look at the code for the concurrent HTTP fetcher in Revactor:

http://github.com/tarcieri/revactor/blob/master/lib/revactor/http_fetcher.rb

…it’s extremely clean compared to the twisted (excuse the pun) mess of
inverted control constructs you’d get in a framework like EventMachine
or
Twisted. Making an HTTP request is as simple as:

Actor::HttpClient.get url

HTTP is a synchronous request/response protocol so it makes much more
sense
to model it as such.

When you call “get”, it sends the request to the server, then suspends
the
current Fiber (which waits for the response data to get streamed back to
its
inbox). This allows any other Fibers who have incoming data to process
their mailboxes.

The Actor::HttpClient.get method thus effectively “blocks” until the
entire
response body has been consumed.

I’m not understanding how creating 64 of them speeds things up.

You don’t need to use Fibers here. You could write everything fully
asynchronously and not need fibers at all.

Or you could use threads! On 1.8 threads are nice and lightweight but
the
I/O performance sucks and net/http and threads get kind of nasty. On
1.9
threads are much slower but the I/O performance is better because
threads
can actually make blocking system calls.

Revactor is using Fibers to give you the best of both worlds: you can do
concurrent I/O as if it were synchronous/threaded while leveraging the
I/O
performance benefits of Ruby 1.9 (and libev) and your underlying
concurrency
primitive is nice and lightweight.

Unlike threads, messaging is baked in, and fully asynchronous unlike
Ruby’s
Queue class. This allows parts of your program to “fire and forget”
messages to other parts of the system, and those other parts can consume
messages when they’re ready.


#14

On Jun 9, 2009, at 3:55 PM, Tony A. wrote:

other. However, they can provide an excellent mechanism for modeling
concurrent I/O where your concurrency primitives spend most of their
time
sleeping waiting for I/O events to happen, which is what’s being
done here.

Thanks for your excellent explanation.

James Edward G. II


#15

On Jun 9, 2009, at 1:19 PM, James G. removed_email_address@domain.invalid
wrote:

operation) so you can keep working on other things while they do.
Ruby’s threads help.
each pause.
When you really want to do two things at once with Ruby, you want
more processes. fork() is your friend. :slight_smile:

James Edward G. II

You really can’t do two things at once anyhow. You do multiple things
“as if” at once. Actually, not true. On multiple cores you can do n
things at once so long as they don’t muck with a single shared
resource. But my application of threading is very similar to the way
we have been using drb. I know what parts of the app will block and
bring things to their knees. Usually network or graphics related
stuff. So these are ways I’d like to use some concurrent construct.

Threads have gotten a terrible reputation in 1.8 so on most cases they
have been easy to ignore in favor of processes. If this (ignoring
threads) is still the likely case in 1.9 then that would be good to
know.

Thx

Steve

From my iPhone


#16

On Tue, Jun 9, 2009 at 12:44 PM, Tony A.removed_email_address@domain.invalid wrote:

On Tue, Jun 9, 2009 at 10:06 AM, Charles Oliver N.
removed_email_address@domain.invalidwrote:

Or just use JRuby, and real concurrent/parallel threads will just work
out of the box :slight_smile:

Well, as best they can on Ruby. Â You guys have done some really great work
on that, but Ruby’s approach to threading is rather poor.

Well, you may have to deal with some peculiarities around
kill/raise/critical and how IO is handled, but in general they really
do “just work”. No GIL, no green threading, no futzing around with
processes. Start up N threads and let them go to town, making blocking
calls, doing long-running IO hits, what have you. They’ll do it all in
parallel.

I’d certainly love to see Threading and IO improve in Ruby, but they
work pretty darn well right now on JRuby, and you don’t have to have N
processes just to do N things at once.

  • Charlie

#17

On Tue, Jun 9, 2009 at 6:27 PM, Charles Oliver N.
removed_email_address@domain.invalidwrote:

Yeah, you guys have done some really amazing work, and are likely to
remain
the only Ruby implementation without a GIL for quite some time


#18

On Wed, Jun 10, 2009 at 12:18 PM, Tony A.removed_email_address@domain.invalid wrote:

Yeah, you guys have done some really amazing work, and are likely to remain
the only Ruby implementation without a GIL for quite some time

Of course I would be remiss if I did not mention that IronRuby has no
GIL either :slight_smile:

  • Charlie

#19

On Wed, Jun 10, 2009 at 11:42 AM, Charles Oliver N.
<removed_email_address@domain.invalid

wrote:

Of course I would be remiss if I did not mention that IronRuby has no
GIL either :slight_smile:

Oh sorry, forgot to add the caveat “usable” Ruby implementation :slight_smile:


#20

On Jun 9, 2009, at 6:51 PM, s.ross wrote:

Threads have gotten a terrible reputation in 1.8 so on most cases
they have been easy to ignore in favor of processes. If this
(ignoring threads) is still the likely case in 1.9 then that would
be good to know.

What I’ve been trying to say is that they are two different tools for
two different jobs.

James Edward G. II