Multithreading and DB access in Rails

scottd72 · June 15, 2006, 2:47am

I just tried writing some controllers, etc. that would allow me to start
and monitor background tasks running in new (Ruby) threads, with the
idea that I’d eventually manage long-running indexing processes that
way. I can kick off such threads OK (by using Thread.new in a routine
called by a controller), but it seems like Rails gets huffy if those
background tasks and the ordinary controller thread try to access the
database at the same time…I wind up with one or more of them losing
their connection to MySQL and throwing exceptions like this:

#<ActiveRecord::StatementInvalid:
c:/ruby/lib/ruby/gems/1.8/gems/activerecord-1.14.2/lib/active_record/connection_adapters/abstract_adapter.rb:120:in
log': Mysql::Error: Lost connection to MySQL server during query: SELECT * FROM managed_threads WHERE (managed_threads.name`
= ‘test1’ ) LIMIT 1>

This is on a WEBrick setup, if that makes any difference.

Obvious questions:

Why the lost connection? Is there some simple way to make it stop
happening?
What’s the recommended way of managing background tasks in a Rails
setup, assuming they need DB access, assuming the answer to the previous
question is “no”?

Thanks,

– Scott

scottd72 · June 15, 2006, 3:46am

Hello Scott,

2006/6/14, Scott D. [email protected]:

Obvious questions:

Why the lost connection? Is there some simple way to make it stop
happening?

What’s the recommended way of managing background tasks in a Rails
setup, assuming they need DB access, assuming the answer to the previous
question is “no”?

I’d advise looking at BackgroundDRb, a plugin by Ezra Z.:
http://www.brainspl.at/articles/2006/05/15/backgoundrb-initial-release

There has been an updated version with better access to ActiveRecord
objects, so that should fit in exactly like what you want.

Hope that helps !

scottd72 · June 15, 2006, 4:08am

FranÃ§ois Beausoleil wrote:

Hello Scott,

I’d advise looking at BackgroundDRb, a plugin by Ezra Z.:
Ruby on Rails Blog / What is Ruby on Rails for?

There has been an updated version with better access to ActiveRecord
objects, so that should fit in exactly like what you want.

Hope that helps !

Oooh…that does indeed look useful. (Looks like it works around the
problems I was having by just putting the background tasks in a separate
Ruby process altogether and then having them communicate with Rails via
drb.) Hmmm…looks like various bits are Unix/Linux-specific, and I’m
currently developing on Windows. D’oh. Guess it was time to install
Linux on one of my old machines anyways…

Thanks!

– Scott

scottd72 · June 15, 2006, 10:05am

Hi

    Yeah the backgroundrb plugin is a little unix centric but it

does

run on windows. When I get time or someone sends a patch(hint, hint)
I will code up a windows service version of the server. Right now you
can still run on windows but it requires you to leave a command
window open while it runs. It won’t be too hard to create a service
version fro windows but I don’t have any windows boxes to try it out
on

Thanks Ezra for your work!

Until the windows service version is available, one can use a service
wrapper such as duodata.de as a workaround (it’s
able
to run console applications as a service; I’ve been using it to host a
little pimki wiki for instance).

hope this helps

cheers

Thibaut

scottd72 · June 15, 2006, 4:38am

On Jun 14, 2006, at 7:08 PM, Scott D. wrote:

Hope that helps !
Thanks!

– Scott

–
Posted via http://www.ruby-forum.com/.

Rails mailing list
[email protected]
http://lists.rubyonrails.org/mailman/listinfo/rails

Hey Scott-

Yeah the backgroundrb plugin is a little unix centric but it does

run on windows. When I get time or someone sends a patch(hint, hint)
I will code up a windows service version of the server. Right now you
can still run on windows but it requires you to leave a command
window open while it runs. It won’t be too hard to create a service
version fro windows but I don’t have any windows boxes to try it out
on

Anyway if you want to use it on windows then you just have to start

it with the script instead of the rake task. for example

from your rails app root

ruby script\backgroundrb\start

Cheers-
-Ezra_______________________________________________
Rails mailing list
[email protected]
http://lists.rubyonrails.org/mailman/listinfo/rails

scottd72 · June 15, 2006, 1:06pm

On Thursday 15 June 2006 01:45, Francois B. wrote:

I’d advise looking at BackgroundDRb, a plugin by Ezra Z.:
Ruby on Rails Blog / What is Ruby on Rails for?

There has been an updated version with better access to ActiveRecord
objects, so that should fit in exactly like what you want.

I’m not the original poster, but I’m also struggling with the lack of
thread safety in rails.

From the description there it seems that BackgrounDRb is a kind
of “generic” drb server that takes any pieces of code to run (the
workers) and provides a nice access wrapper on the client side. If that
is correct, I can’t really use it – my background tasks have to run
as ‘root’ (system admin tool). Running a generic “please run my code”
server is a big no-no for that

I’m also a bit confused about what exactly is the problem with
threading.
BackgrounDRb seems to use multithreaded ActiveRecord just fine – so why
does rails have problems with it?

Then there’s Zed’s comment in another thread:

supposedly core PHP is thread-safe. What benefits are there to turning
on concurrency? Somewhat better performance?

I may list out the things that can go horribly wrong in the near future,
but this is kind of a catch-22. Â I know of things that go wrong, but
they are only in my testing and when I’ve examined the code. Â They’re
not really bugs in rails, just usage patterns that people have which
don’t work in rails (like firing up popen).

… which seems to say that rails actually is basically threadsafe.
Which would indicate that BackgrounDRb is unnecessary if the rails app
is
coded with threading in mind. But then again basically all parts of
ActiveRecord access (which seems to be the major problem area) that
could
be problematic in a threaded environment are handled automatically by
rails or AR itself.

Summary: I’m thoroughly confused by now 8+?

scottd72 · June 16, 2006, 11:43am

On Thursday 15 June 2006 16:10, Ezra Z. wrote:

won’t run any code except what you tell it to run in your worker
classes. So it’s every bit as safe as you want it to be.

But the worker classes are defined in the client, right? At least this
excerpt suggests that:

class FooWorker
include DRbUndumped

…

def start_working
# Work loop goes inside a new thread so it doesn’t block
# rails while it works. A neat way to do progress bars in

…

end

And if that’s the case, what prevents another process from connecting to
the BackgrounDRb server and submitting its own workers?

I am using
it in a few places where the drb server runs as root because it needs
root to do some sysadmin tasks. Its a lot safer to run your root code
in a separate process then it is to run your rails app as root.

That’s actually basically what I’m trying to do – running the frontend
as
normal rails app (nonprivileged) and the backend (privileged)
separately.
But I liked the idea of exposing the backend API via soap, which would
have been easiest by making it a rails app as well (with AWS).

Well ActiveRecord itself seems to work very well in threaded mode by
itself. Its mainly the combination of ActionController and
ActiveRecord that have threading problems. People do all kinds of
weird stuff in their rails controllers like using popen to run tasks
and other things that make it hate itself.

Hmm, first I wanted to say here that I’m IMHO pretty good at not doing
weird stuff in my code, but then I had a look at the popen docs again –
and I don’t understand why it’s dangerous.

ActiveRecord stores its
database connection in Thread.current, which cause the Mysql server
has gone away error when two threads step on each other’s database
connections.

Maybe I should take the time to see if I can patch it to do proper
connection pooling. Storing connections in Thread.current also seems
(concurrency issues aside) to invite resource leakage (i.e. how and when
are these connections closed?)

For me it was much easier to create my own environment outside of
rails with threading in mind from the start and push these tasks to
another ruby instance. This was a much easier task then trying to
make rails itself thread safe. Rails handles what it is good at very
well. And that is handling web requests as fast as it can. But long
running tasks just clash with the whole HTTP request/response cycle
and so thats why I saw the need for this plugin.

Maybe I’ll skip BackgrounDRb then and just expose my entire backend api
via DRb directly. Not as nice as the soap variant (and harder to protect
against injection attacks), but well…

Thanks for the info

scottd72 · June 15, 2006, 6:13pm

On Jun 15, 2006, at 6:03 AM, Christian R. wrote:

From the description there it seems that BackgrounDRb is a kind
of “generic” drb server that takes any pieces of code to run (the
workers) and provides a nice access wrapper on the client side. If
that
is correct, I can’t really use it – my background tasks have to run
as ‘root’ (system admin tool). Running a generic “please run my code”
server is a big no-no for that

You pretty much have the right idea here. But it the drb server

won’t run any code except what you tell it to run in your worker
classes. So it’s every bit as safe as you want it to be. I am using
it in a few places where the drb server runs as root because it needs
root to do some sysadmin tasks. Its a lot safer to run your root code
in a separate process then it is to run your rails app as root.
Backgroundrb uses DRb acl lists so you can specify the ip adressess
that can access the server. By default it only allows access from
localhost but you can get as fine grained as you want here.

I’m also a bit confused about what exactly is the problem with
threading.
BackgrounDRb seems to use multithreaded ActiveRecord just fine –
so why
does rails have problems with it?

Well ActiveRecord itself seems to work very well in threaded mode by
itself. Its mainly the combination of ActionController and
ActiveRecord that have threading problems. People do all kinds of
weird stuff in their rails controllers like using popen to run tasks
and other things that make it hate itself. Rails is a pretty big code
base and the combination of all parts is not thread safe. ANd it
would take a lot of work and probably some performance hits to make
it safe.

In BackgrounDRb I use mutexes in the key spots to make sure that the

threads don’t stomp on each other. But I hear you, I haven’t ever
been able to get a real straight answer as to why rails is not thread
safe in general. I think its just that it is a large framework and
its designed for shared nothing where requests are run in separate
fcgi’s or mongrel backends and are serialized in these back ends so
only on request per backend runs at once.

I have tested BackgrounDRb with ActivceRecord reads and writes in a

tight loop with 100 workers grabbing stuff from AR and changing the
records and saving and it works fine. Of course multi threaded
programming is hard to do ‘right’ so I am sure there are edge cases
that I have not forseen yet, but I will handle those as they come.

not really bugs in rails, just usage patterns that people have which

Summary: I’m thoroughly confused by now 8+?

Even if rails was totally thread safe I would still not recommend

just spinning off threads at will from your controllers during web
requests. Its just bound to get messy. ActiveRecord stores its
database connection in Thread.current, which cause the Mysql server
has gone away error when two threads step on each other’s database
connections.

For me it was much easier to create my own environment outside of

rails with threading in mind from the start and push these tasks to
another ruby instance. This was a much easier task then trying to
make rails itself thread safe. Rails handles what it is good at very
well. And that is handling web requests as fast as it can. But long
running tasks just clash with the whole HTTP request/response cycle
and so thats why I saw the need for this plugin.

Cheers-
-Ezra

scottd72 · June 16, 2006, 6:07pm

On Jun 16, 2006, at 4:39 AM, Christian R. wrote:

Maybe I’ll skip BackgrounDRb then and just expose my entire backend
api
via DRb directly. Not as nice as the soap variant (and harder to
protect
against injection attacks), but well…

Thanks for the info

Christian-

You're welcome. And I think drb is a very nice way to solve the

problem you are working on. I am not opposed to adding some kind of
optional $SAFE mode if that would make you more comfortable. But I
think that might be a hassle to deal with. A better idea would be to
add some kind of token based auth to backgroundrb. I am open to
adding this if you feel its needed. If you want, please join the
mailing list and I will work on it with you to add some more security
to the system.

Cheers-
-Ezra

scottd72 · June 16, 2006, 1:22pm

On Friday 16 June 2006 11:39, Christian R. wrote:

def start_working
# Work loop goes inside a new thread so it doesn’t block
# rails while it works. A neat way to do progress bars in

…

end

Of course I got it wrong here
Nevertheless, unless the DRb server runs with $SAFE >= 1, code injection
is possible. Unless I got it wrong again of course

scottd72 · June 16, 2006, 10:29pm

On Friday 16 June 2006 16:02, Ezra Z. wrote:

You’re welcome. And I think drb is a very nice way to solve the
problem you are working on. I am not opposed to adding some kind of
optional $SAFE mode if that would make you more comfortable. But I
think that might be a hassle to deal with. A better idea would be to
add some kind of token based auth to backgroundrb. I am open to
adding this if you feel its needed. If you want, please join the
mailing list and I will work on it with you to add some more security
to the system.

Well, I’m more worried about this (from the DRb apidocs):

!!! UNSAFE CODE !!!

ro = DRbObject::new_with_uri(“druby://your.server.com:8989”)
class << ro
undef :instance_eval # force call to be passed to remote object
end
ro.instance_eval("rm -rf *")

I don’t think it’s possible to authenticate a drb client before actual
drb
requests are handled by the server, so the injection exploit above could
be used on the authentication code itself :).

An elevated $SAFE would be a real PITA when working with the database,
since AFAIS everything coming from it would be tainted by default. Maybe
running the DRb server (serving a proxy to the real API) in a separate
thread with $SAFE >= 1 (2 should be fine), but then I also need a way to
pass the requests on to a proper “unsafe” thread. Maybe:

client -> [DRb “proxy” w/ $SAFE=2] -> [UNIX socket, only root
accessible] -> [DRb server w/ $SAFE=0]

But then I’m maybe paranoid. Or the real solution is much simpler. Or
DRb
is just the wrong tool for this particular job. I have to give it some
more thought.

scottd72 · June 17, 2006, 12:03am

On Friday 16 June 2006 20:58, Ezra Z. wrote:

because I control the server. Do you have others who have access to
your server is that why you are worried they will connect with their
own drb clients and do malicious things?

I’m mainly worried about someone getting in as any user and then using
the
system to get root privileges. Unlikely that it’ll ever happen, but I’m
a
perfectionist

scottd72 · June 16, 2006, 11:00pm

On Jun 16, 2006, at 3:28 PM, Christian R. wrote:

I don’t think it’s possible to authenticate a drb client before
way to
pass the requests on to a proper “unsafe” thread. Maybe:

client -> [DRb “proxy” w/ $SAFE=2] -> [UNIX socket, only root
accessible] -> [DRb server w/ $SAFE=0]

But then I’m maybe paranoid. Or the real solution is much simpler.
Or DRb
is just the wrong tool for this particular job. I have to give it some
more thought.

Yeah I think $SAFE mode would be a pita to work with. But you could

use drbssl and store the client cert in your rails database password
protected. Then when you start the server you could have it prompt
for password, authenticate you and then pull the cert out of the db
and use that to connect. That way no one else could connect to the
drb server at all without the cert. But maybe you don’t need all of
this and a web service is the way to go. I’m happy with my setup
because I control the server. Do you have others who have access to
your server is that why you are worried they will connect with their
own drb clients and do malicious things?

-Ezra