How about ruby's threads?

rubynut · December 5, 2009, 4:40am

Hello,

Is ruby’s threads safe enough?
Once I used perl for programming a lot, but perl’s threads is not so
nice, and modperl’s threads is not safe.

Thanks.

rubynut · December 5, 2009, 4:51am

On Fri, Dec 4, 2009 at 9:39 PM, Ruby N. [email protected]
wrote:

Hello,

Is ruby’s threads safe enough?
Once I used perl for programming a lot, but perl’s threads is not so
nice, and modperl’s threads is not safe.

I just have to say, I read in my mind as borat (the movie) speaking.

Seriously though, it depends on your application.
Threads are in much better shape in Ruby 1.9 than in 1.8.
I would say that 1.8 uses green threads, while 1.9 you have real
threads and fibers.

Also, keep in mind which interpreter you are using.
My aforementioned statements are only true for the ruby interpreter on
ruby-lang.org

JRuby is a totally different story.

Andrew McElroy

rubynut · December 5, 2009, 5:55am

On Friday 04 December 2009 09:39:33 pm Ruby N. wrote:

Is ruby’s threads safe enough?

What do you mean by “safe”?

Once I used perl for programming a lot, but perl’s threads is not so
nice,

What was wrong with them?

and modperl’s threads is not safe.

What wasn’t safe about them?

I’m really not sure how to answer this question.

On the one hand, I can say that Ruby threads are probably about as
“safe” as
Perl threads, or Python threads, or any threads, in ANY language. It
depends
whether a specific library or application is threadsafe – for example,
you
mentioned mod_perl. That is really more mod_perl’s fault, not so much
Perl’s
fault.

On the other hand, no threads are safe, really – in ANY language. Look
into
other concurrency models, like actors, processes, or events.

rubynut · December 5, 2009, 6:15pm

On Fri, Dec 4, 2009 at 8:39 PM, Ruby N. [email protected]
wrote:

Is ruby’s threads safe enough?

Threads as a concurrency primitive are about as “safe” as C’s pointers
were
at managing memory.

rubynut · December 6, 2009, 5:01am

Well, I asked this because Perl thread documentation warns that
multithreading should not be used in production systems.
And Perl’s threads has many limitations, as this article said:
http://www.perlmonks.org/index.pl?displaytype=print;node_id=288022

Python’s is better. So I was asking if Ruby’s threads is also better
for any production usage.
Thanks.

2009/12/5 David M. [email protected]:

rubynut · December 6, 2009, 9:24pm

First, it’s just a preference, but I think most on the list agree to me

please don’t top-post. Start your post after the quote, preferably after
the
relevant section.

On Saturday 05 December 2009 10:00:14 pm Ruby N. wrote:

Well, I asked this because Perl thread documentation warns that
multithreading should not be used in production systems.

I don’t know of any similar limitation in Ruby, but I will say that you
probably shouldn’t use Ruby threads in production systems. That doesn’t
have
anything to do with Ruby, it has to do with the concept of threads in
general.

And Perl’s threads has many limitations, as this article said:
http://www.perlmonks.org/index.pl?displaytype=print;node_id=288022

This lists one limitation and one weird design feature.

The weird design feature is that apparently Perl threads don’t share
variables
– this is like fork(), and you may as well use fork anyway. The one
thing
they give you that fork doesn’t is “shared variables” – so you can
explicitly
share variables between threads.

Ruby doesn’t do this. In Ruby, all variables are essentially shared
between
threads, and I’m pretty sure there aren’t massive data structures copied
between threads, so they’re much lighter weight than Perl threads. But
this
means that any time you change a variable that might be visible
elsewhere in
your program, you have to make sure you synchronize access (with locks
and
such). So Ruby threads will probably be faster, but much more dangerous
than
Perl threads unless you know what you’re doing.

Python’s is better.

I would guess Ruby threads are similar to Python.

All that said, you are probably asking the wrong question:

harmful.html&Itemid=29

The question you are asking is, “Are Ruby threads at least as good as
Python
threads?” The answer to that is probably yes, and better if you use
JRuby.

The question you should be asking is, “What’s the best way to handle
concurrency in Ruby?” The answer is, it depends what you’re doing, but
it’s
probably not threads.

rubynut · December 7, 2009, 4:20pm

David M. wrote:

The question you should be asking is, “What’s the best way to handle
concurrency in Ruby?” The answer is, it depends what you’re doing, but
it’s
probably not threads.

I agree that it depends on what you’re doing - but Ruby threads are
often useful, especially when used in a coarse-grained way.

For example, suppose you have a bunch of objects and each is opening a
HTTP connection to some remote server and pulling down content, and you
want this to happen concurrently. Each object is essentially
self-contained. Having these doing concurrent downloads within threads
is straightforward to program and pretty robust.

The alternatives aren’t pretty: rewrite your application in an
event-driven way (so you have to find a HTTP client library which works
this way too), or fork off separate processes (which then have to
communicate back to the central one with the results, which might mean
select’ing across the children, or using the filesystem as a temporary
data store)

Not every application has to be mega-scalable and bombproof; a Sinatra
web app with a handful of concurrent clients might usefully use threads
too.

Of course, the assumption here is that you’re programming in Ruby. If
you want to avoid threads (which I agree is a good thing to do) and
still have concurrency, then it might be better to switch to Erlang
rather than jump through hoops in Ruby.

rubynut · December 7, 2009, 8:18pm

On Monday 07 December 2009 09:20:08 am Brian C. wrote:

David M. wrote:

The question you should be asking is, “What’s the best way to handle
concurrency in Ruby?” The answer is, it depends what you’re doing, but
it’s
probably not threads.

Ruby threads are
often useful, especially when used in a coarse-grained way.

I agree. However, I don’t think threads are the best primitive to use
for
coarse-grained multithreading. I much prefer processes and
message-passing.

For example, suppose you have a bunch of objects and each is opening a
HTTP connection to some remote server and pulling down content, and you
want this to happen concurrently. Each object is essentially
self-contained. Having these doing concurrent downloads within threads
is straightforward to program and pretty robust.

I agree, and this is how I do that – I should clarify. I like threads
technologically. I think they can be much cleaner than Unix processes. A
fork() is nice to prevent one crash from bringing down your entire app
– but
your app shouldn’t be crashing that badly in the first place.

You mentioned Erlang. It will do some N:M threading – that is, there
really
will be some OS threads involved. In theory, one crash could bring down
your
entire app. Also in theory, the Erlang runtime is robust enough that
this will
Never Happen – and to ensure that, the preferred way to write C
extensions is
as separate processes which talk to Erlang via RPC. More efficient than
fork
on Unix, but much more reliable than “threads” in just about any
language.

That is: I see threads as both as harmful and as useful as Goto. All
CPUs
essentially implement Goto, but no one in their right mind codes in
terms of
Goto. We abstract it away, and use structured code.

The alternatives aren’t pretty: rewrite your application in an
event-driven way (so you have to find a HTTP client library which works
this way too),

A quick Google turns up Rev::HttpClient, so this probably wasn’t the
best
example.

or fork off separate processes (which then have to
communicate back to the central one with the results, which might mean
select’ing across the children, or using the filesystem as a temporary
data store)

Or abstracting this away until it’s more manageable. You can do that
with
threads, too, but in Ruby, more processes means more concurrency, unless
you’re doing JRuby – and it definitely means something safer.

Of course, the assumption here is that you’re programming in Ruby. If
you want to avoid threads (which I agree is a good thing to do) and
still have concurrency, then it might be better to switch to Erlang
rather than jump through hoops in Ruby.

That’s probably true, if you can manage it – but even in Ruby, there
are
things that will abstract away threads for you.

The biggest problem I have with Erlang is that the syntax is hideous,
especially after Ruby. The second biggest problem I have is that while
it
handles concurrency and binary data very well, Ruby handles just about
everything else better – Unicode, string processing, metaprogramming
and
reflection, DSLs…

This is why I have such high hopes for Reia, and why I’m tempted to
dabble in
io – I want something that’s at least as beautiful as Ruby (though I do
like
prototypal inheritance), but at least as good at concurrency as Erlang.

But in the mean time, I’m going to say that processes are likely to have
way
fewer surprises for the average newbie, while hypocritically building
wrappers
around threads for fun.

rubynut · December 7, 2009, 8:30pm

On Tue, Dec 8, 2009 at 12:47 AM, David M. [email protected]
wrote:

The biggest problem I have with Erlang is that the syntax is hideous,
especially after Ruby. The second biggest problem I have is that while it
handles concurrency and binary data very well, Ruby handles just about
everything else better – Unicode, string processing, metaprogramming and
reflection, DSLs…

This is why I have such high hopes for Reia, and why I’m tempted to dabble in
io – I want something that’s at least as beautiful as Ruby (though I do like
prototypal inheritance), but at least as good at concurrency as Erlang.

The other option is to attach ruby workers to an erlang backbone

m.

rubynut · December 7, 2009, 9:39pm

On Mon, Dec 7, 2009 at 12:59 PM, David M. [email protected]
wrote:

Anyway, I should probably go hang out on the Reia list, huh?

Probably

For what it’s worth Reia is still being worked on. The new branch is
focusing on getting a minimal feature set (what already exists in Erlang
itself) up to production quality without adding new, additional features
until that’s done.

You could have a more Ruby-like language than Reia which still afforded
many
of these same benefits. For example there’s no reason process-local
state
can’t be mutable. However, when a message is sent to another process,
it
should make a copy of the original state (hopefully using a mechanism
like
COW to keep things sane). The receiver gets a “snapshot” of a given
chunk
of state at the time it’s sent.

As for what virtual machine such a hypothetical language could run on…
dunno.

rubynut · December 7, 2009, 9:02pm

On Monday 07 December 2009 01:30:12 pm Martin DeMello wrote:

concurrency as Erlang.

The other option is to attach ruby workers to an erlang backbone

This would miss out on one of the biggest wins for Erlang, which I’m not
sure
is compatible with Ruby as a language – the VM and the concept of
shared-
nothing processes with immutable storage.

Disclaimer: The following is based on assumptions I haven’t bothered to
check.
However, if the Erlang VM doesn’t behave this way, I’m pretty sure it
could.

See, in Erlang, you only need to worry about your messages being too big
when
they’re going to go over a network. Short of that, you can pass around
data
structures as big as you want, without much slowdown.

The reason is that in Erlang, all data structures are immutable. Erlang
carries this to a perverse level, by making variables assign-once, which
really isn’t necessary. But the point is this:

In Ruby, if I pass a hash to another thread, I now have two threads
which can
see the same hash. Since the hash is mutable, my two threads now have to
coordinate on who gets to change it when. The only way to fix this would
be to
pass a duplicated hash to that thread, wasting time and RAM – after
all, the
original hash might be about to be GC’d – or to freeze the hash, so
that now,
if either thread wants to make changes, they each have to duplicate it,
meaning possibly two copies of the hash. Not pretty.

In fact, the most Erlang-like way to do this is separate worker
processes, as
you describe. Great, now messages have to be serialized as strings, sent
over
a pipe, and then parsed – even more of a performance hit.

In Erlang, since data structures are immutable, they can simply be
passed by
reference – the other thread can’t change them, so why not let them be
shared? What’s more, if I need to create a slightly modified data
structure,
the most natural way to do that results in Rope-like structures – so
even
within a single process, it’s probably more efficient – but it also
means
message-passing is almost free.

That’s the big draw of Erlang for me – I get to program with hundreds,
even
thousands of loosely-coupled processes that are at least as safe as
separate
Unix processes, yet the performance penalty is far less than even tens
of Ruby
processes trying to do the same thing. Ruby threads would be less safe
and
less concurrent, whereas Erlang will automatically scale to multiple
cores.

It might be a bit weird to hear me arguing for the performance win,
given that
I’m not ashamed to suggest throwing more hardware at a problem, or
repeat
“premature optimization…” when people criticize Ruby for being slow.

But this is a bit like Git. One of the main reasons I use Git is the
performance improvement – up to a certain point, the extra performance
buys
you nothing. Past that point, you realize that branches, merges, and
commits
are essentially free, and it liberates you to work and collaborate in
ways
that, while it’s technically possible to do with other DVCSes (even with
SVN),
you’re much more likely to do it in Git, where ‘git checkout -b’ is
instantaneous and ‘git merge’ is seconds at most.

Erlang processes are like that. Up to a point, it’s just a nice
performance
improvement, and you’re still manually twiddling the balance between
threads,
processes, and event models, possibly using some monstrous combination
of all
three. Each choice might give you more or less concurrency, more or less
performance, and more or less weird edge cases – and in the case of
traditional threads, no matter how efficient they get, you’re going to
be wary
about adding more of them, and having to lock more things, and by the
time you
lock everything as you should, much of the performance is gone.

By removing that barrier of performance, and by making processes easy to
spawn
and manage, you can suddenly stop worrying about it. You can easily
spawn one
process per connection – to anything. You can spawn processes whose
entire
job is to keep track of a single counter. You can spawn processes like
you
don’t care, like it’s going out of style – much the same way you’d
spawn
objects in Ruby, only more so.

It’s the kind of performance improvement that’s not just squeezing a few
more
percent out of the hardware you’ve got, or shaving a few milliseconds
off a
task that was already fast enough. It’s the kind of performance
improvement
that fundamentally changes the way you work.

It’s the difference between ‘git merge’ taking a few seconds and ‘svn
merge’
taking half an hour. (And no, I’m not making that up. It routinely did
so,
when I was using it at work. People switched to git-svn for that reason
alone.)

If I’m just going to have a bunch of Ruby workers anyway, I’d actually
save
some RAM by getting myself a COW-friendly Ruby and using fork directly
to
create workers, instead of running them from Erlang. In fact, if that’s
what I
was going to do, I’m not entirely sure why I’d want the Erlang backbone
anyway. But Ezra is a smart guy, so I figure there must be some reason
he
wrote Nanite that way.

Anyway, I should probably go hang out on the Reia list, huh?

rubynut · December 8, 2009, 1:39am

Brian C. wrote:

The alternatives aren’t pretty: rewrite your application in an
event-driven way (so you have to find a HTTP client library which works
this way too), or fork off separate processes (which then have to
communicate back to the central one with the results, which might mean
select’ing across the children, or using the filesystem as a temporary
data store)

An additional alternative that is pretty neat is combining an
event-driven library with Fibers in ruby 1.9.x.

Assuming of course, the existence of an HTTP library in that
idiom.

But I have a homegrown RPC library implemented using EventMachine,
which I recently adapted to use Fibers. So far, I am really
pleased with the result. It’s like the best of both worlds.
My app is still single threaded, so I don’t need any mutex /
synchronization / locking. But, I still get linear-style method
call semantics, in separate “parallel” execution Fibers. (In
essence the same sort of thing people have previously done with
continuation-based event libraries, only without the overhead of
continuations.)

So in any given fiber I can write pretty normal looking code,
like:

paths = @catalog.search(“caption” => cap, “filename” => fname)
unless paths.empty?
title = “Search: #{str}”
doc_id = @window_svc.new_document_with_search_results(title, paths)
end

…and even though @catalog.search and @window_svc.new_document
may be making RPC calls to a remote host, only the current Fiber
will block waiting for the result.

I haven’t used this technique long enough to have discovered if
there may be any pitfalls. But–so far–it’s like the
convenience of threaded programming without the synchronization
issues.

Regards,

Bill

rubynut · December 8, 2009, 4:04am

On Mon, Dec 7, 2009 at 5:39 PM, Bill K. [email protected] wrote:

An additional alternative that is pretty neat is combining an
event-driven library with Fibers in ruby 1.9.x.

Assuming of course, the existence of an HTTP library in that
idiom.

Revactor provides exactly this with its HTTP client. Revactor models
actors
as fibers and wraps an underlying evented client (the afforementioned
Rev::HttpClient) with them:

github.com

tarcieri/revactor/blob/master/lib/revactor/http_client.rb

#--
# Copyright (C)2007-10 Tony Arcieri
# You can redistribute this under the terms of the Ruby license
# See file LICENSE for details
#++

require 'uri'
  
module Revactor
  # Thrown for all HTTP-specific errors
  class HttpClientError < StandardError; end
  
  # A high performance HTTP client which wraps the asynchronous client in Rev
  class HttpClient < Coolio::HttpClient
    # Default timeout for HTTP requests (until the response header is received)
    REQUEST_TIMEOUT = 60
    
    # Read timeout for responses from the server
    READ_TIMEOUT = 30

This file has been truncated. show original

Here’s a concurrent HTTP fetcher written around this HTTP client:

github.com

tarcieri/revactor/blob/master/lib/revactor/http_fetcher.rb

#--
# Copyright (C)2007 Tony Arcieri
# You can redistribute this under the terms of the Ruby license
# See file LICENSE for details
#++

require 'zlib'
require 'stringio'

module Revactor
  # A concurrent HTTP fetcher, implemented using a central dispatcher which
  # scatters requests to a worker pool.
  #
  # The HttpFetcher class is callback-driven and intended for subclassing.
  # When a request completes successfully, the on_success callback is called.
  # An on_failure callback represents non-200 HTTP responses, and on_error
  # delivers any exceptions which occured during the fetch.
  class HttpFetcher
    def initialize(nworkers = 8)
      @_nworkers = nworkers

This file has been truncated. show original

rubynut · December 8, 2009, 10:46am

David M. wrote:

You mentioned Erlang. It will do some N:M threading – that is, there
really
will be some OS threads involved. In theory, one crash could bring down
your
entire app. Also in theory, the Erlang runtime is robust enough that
this will
Never Happen – and to ensure that, the preferred way to write C
extensions is
as separate processes which talk to Erlang via RPC. More efficient than
fork
on Unix, but much more reliable than “threads” in just about any
language.

That is: I see threads as both as harmful and as useful as Goto.

Absolutely all true.

Of course, threads are going to be necessary at some level (just as goto
is necessary at a low level), because that’s how SMP hardware actually
works.

The benefit of erlang is that is uses threads on your behalf but
provides you a much better message-passing abstraction in its place.

(It’s possible to have message-passing at the hardware level. The
Transputer is an example of that. Writing in Occam for the Transputer is
like writing in C for a regular CPU - one step above machine code)

The biggest problem I have with Erlang is that the syntax is hideous,
especially after Ruby.

When I get a few spare cycles I’m trying to hack together a
ruby-flavoured erlang: just a front end which emits either the erlang
abstract syntax form, or regular erlang source.

This is why I have such high hopes for Reia, and why I’m tempted to
dabble in
io – I want something that’s at least as beautiful as Ruby (though I do
like
prototypal inheritance), but at least as good at concurrency as Erlang.

After erlang, when I look at ocaml again it starts to make a lot more
sense (for example, all the pattern-matching stuff). And ocaml can
compile directly to machine code too.

rubynut · December 8, 2009, 7:36pm

On Tue, Dec 8, 2009 at 2:46 AM, Brian C. [email protected]
wrote:

When I get a few spare cycles I’m trying to hack together a
ruby-flavoured erlang: just a front end which emits either the erlang
abstract syntax form, or regular erlang source.

For what it’s worth, that’s what Reia is already.

rubynut · December 8, 2009, 9:27pm

Tony A. wrote:

On Tue, Dec 8, 2009 at 2:46 AM, Brian C. [email protected]
wrote:

When I get a few spare cycles I’m trying to hack together a
ruby-flavoured erlang: just a front end which emits either the erlang
abstract syntax form, or regular erlang source.

For what it’s worth, that’s what Reia is already.

No, I don’t think so. Reia will run on the Erlang VM but will be a
substantially different language to Erlang: it will have destructive
assignment, and be a lot more dynamic. Being able to compile AOT to
.beam files is not necessarily going to be provided. Furthermore it will
have different function call semantics, which you’ll have to take
account of if calling reia from erlang or vice versa.
http://groups.google.com/group/reia/browse_thread/thread/668e6b302bba98b6

What I’m toying with is just a different syntax for standard Erlang,
which you’d compile to .beam and would be indistinguishable from Erlang
at that level. Not sure I’m going to have the time, but I’ve just
started hacking erlang’s existing yecc grammar which looks like the path
of least resistance.

rubynut · December 8, 2009, 10:42pm

So, psst, I’m the author of Reia, so you can take my responses as
authoritative

On Tue, Dec 8, 2009 at 1:27 PM, Brian C. [email protected]
wrote:

No, I don’t think so. Reia will run on the Erlang VM but will be a
substantially different language to Erlang: it will have destructive
assignment, and be a lot more dynamic.

Yes, it has destructive assignment and late(r) binding. However, these
are
the only major departures from Erlang, at least for the initial version.

Being able to compile AOT to .beam files is not necessarily going to be
provided.

I’ve had so many requests for this feature I’m certainly going to
support
it.

Furthermore it will have different function call semantics, which you’ll
have to take
account of if calling reia from erlang or vice versa.
http://groups.google.com/group/reia/browse_thread/thread/668e6b302bba98b6

This is true, however those call semantics are what enable blocks. I
guess
you don’t plan on having blocks.

What I’m toying with is just a different syntax for standard Erlang,
which you’d compile to .beam and would be indistinguishable from Erlang
at that level. Not sure I’m going to have the time, but I’ve just
started hacking erlang’s existing yecc grammar which looks like the path
of least resistance.

Well, good luck. I’m not sure how wild people are going to be about a
language with a Ruby-like grammar and single assignment.

rubynut · December 8, 2009, 10:58pm

On Tuesday 08 December 2009 02:27:31 pm Brian C. wrote:

Reia will run on the Erlang VM but will be a
substantially different language to Erlang: it will have destructive
assignment, and be a lot more dynamic. Being able to compile AOT to
.beam files is not necessarily going to be provided. Furthermore it will
have different function call semantics,

For what it’s worth, just about all of these sound like improvements to
me.

The “destructive assignment” seems to come in two forms – private
variables
can be destructively assigned, but that’s a purely syntactical
treatment, as
those will (I assume) be compiled into singly-assigned Erlang variables.

Instance variables can also be destructively assigned, and that’s the
biggest
difference. But they also cannot be shared between processes (that I
know of),
so they don’t break the advantages of Erlang.

What I’m toying with is just a different syntax for standard Erlang,
which you’d compile to .beam and would be indistinguishable from Erlang
at that level.

I should clarify, then – while I complain about Erlang’s syntax, that’s
not
the only problem I have with it. I like object-oriented semantics, and
I
think they’d map well onto actors, which is exactly the approach Reia is
taking.

Think about the features that make Ruby shine. A few of those are purely
syntactical. A few of them are based on the core idea that objects don’t
have
to inherently be any particular type – they’re just entities you send
messages to, and you don’t know what response you’ll get until you
actually
send the message.

This is both the core of duck typing and a fair description of Erlang
processes. Essentially, an Erlang function that expects a process
doesn’t care
what kind of process you give it, so long as that process can handle the
messages it wants to send.

So while I can see the usefulness of pattern-matching for handling
incoming
requests, that’s a bit like method_missing – or maybe the opposite,
putting
something in front of the method calls. Still, most messages make sense
mapping directly to method calls, and I think doing so would work well
as a
convention-over-configuration approach – for the same reason that it
makes
sense to have URLs correspond to methods on a controller, by convention.

I guess what I’m saying is, I want to have my cake and eat it. Ruby is
already
built in such a way that it would map very naturally onto the actor
model.
Unfortunately, since that’s never been tried, any attempt to do so is
probably
doomed – there are entirely too many assumptions that code is executed
linearly.

I’m actually trying anyway, for fun. I’ve been (on and off for about a
year)
writing an actor library for Ruby that actually does exactly what I’m
describing. But it will be slow (it uses Threads and Queues) and only
safe if
you know what you’re doing.

rubynut · December 8, 2009, 11:29pm

Tony A. wrote:

So, psst, I’m the author of Reia, so you can take my responses as
authoritative

D’oh! I can’t believe I missed that. My sincere apologies.

This is true, however those call semantics are what enable blocks. I
guess
you don’t plan on having blocks.

I was going to do them in a rather trivial way, just by passing the
block as the first argument:

That’s on the observation that lists:map, lists:foldl and lists:filter
all seem to take a function as the first argument.

I’m not sure how wild people are going to be about a
language with a Ruby-like grammar and single assignment.

You’re probably right. It’s just a toy.

The first things I plan to do are:

atoms in ruby syntax: :foo
hence :foo() is a regular function call
ditto a = :foo; a()
barewords which are not seen in LHS of match expressions are also
implicitly function calls:
foo —> foo()
foo “bar” —> foo(“bar”)
io.write --> io:write()
def…rescue…ensure…end for function definitions (wrapping in a
‘try’ automatically if necessary)
|| -> orelse, && -> andalso

then see what it looks like.

I’d also like to use the “…” syntax for binaries rather than lists,
but that complicates matters when you have to write
a = “hello”
b = " world"
c = <<a/binary, b/binary>>