Is threading the right option? Or changing my Apache setup?

Mikel_L · February 12, 2008, 1:36am

I have a Rails App that is hitting a fairly big database (several
million rows of data).

The app runs really well as I have enough ram on the servers to have
enough mongrels running and enough fully primed instances of postgres
up with several of the more commonly used tables up in RAM), but I hit
one performance brick wall that I am not sure how to get around.

If a user requests one of our larger queries (which can take up to 2-3
minutes to run), that mongrel is blocked as rails is chugging away
with postgres getting the resulting data set. I have tuned the query
(it now takes 2-3 minutes instead of 8-15) and have the correct
indices on the tables etc. I am sure I can do more here, but it is
diminishing on the speed returns.

The problem is not so much the response time to the user who is doing
the query as they know that this query will take time and this is
expected. It is done via an AJAX call and they get some progress
information.

The problem is that the apache server then takes the next incoming
requests and sends them off to the mongrels in turn, it wraps around
all the mongrels and tries to serve again to this mongrel that is
doing the long query and so the second user gets blocked waiting for
the first query to finish.

One handling would be multi threaded rails app, but I am sure there is
a better option.

I tried setting the Apache balancer to max=1 but this didn’t seem to
solve it.

How is anyone else handling this? Running it in backgroundrb doesn’t
seem to be an option because I am producing an interactive list, not a
static page / pdf or report.

So, how do I get apache to ignore this blocked mongrel and skip on to
the next one?

Regards

Mikel

Mikel_L · February 12, 2008, 2:07am

Mikel L. wrote:

with postgres getting the resulting data set. I have tuned the query
requests and sends them off to the mongrels in turn, it wraps around
seem to be an option because I am producing an interactive list, not a
static page / pdf or report.

So, how do I get apache to ignore this blocked mongrel and skip on to
the next one?

I had the same problem with nginx and used a patched version discussed
here with other solutions :
http://www.ruby-forum.com/topic/140023

In your case, using haproxy behind Apache might be the simplest.

If you are interested by nginx, the link to the patched nginx doesn’t
work for me now. I can provide the 0.6.24 sources with the fair balancer
module I use in production if needed. The diff between the official
version and mine should be small enough for a quick audit (I did just
that some weeks ago).

Lionel

Mikel_L · February 12, 2008, 4:32am

with postgres getting the resulting data set. I have tuned the query
requests and sends them off to the mongrels in turn, it wraps around
seem to be an option because I am producing an interactive list, not a
static page / pdf or report.

Why couldn’t you pass it off to backgroundrb? It could then stuff the
results into a temporary table or memcache and you could look for a
finished result there. That would free up mongrel to do it’s thing as
you’d only be querying “are you done yet?” over and over till it was.

Maybe your data won’t let you do that due to it’s size requirements
though.

Mikel_L · February 12, 2008, 5:07am

Question though Phillip, how would Memcache help in this situation of
long running SQL queries?

Never mind, I went and read the memcached website

thanks for the good pointer, it looks like a good idea!

Regards

Mikel

Mikel_L · February 12, 2008, 10:33am

On Tue, Feb 12, 2008 at 9:36 AM, Mikel L. [email protected]
wrote:

Question though Phillip, how would Memcache help in this situation of
long running SQL queries?

Never mind, I went and read the memcached website

thanks for the good pointer, it looks like a good idea!

In fact, you can use BackgrounDRb to store results in Memcache, so as
result is available across all the mongrel clusters.
In a nutshell, you pass the query to BackgrounDRb worker and worker
stores the results back in memcache with a session identifier. You
poll BackgrounDRb with ask_status and when query is finished
ask_status will return to the final result.

You don’t even need to use Memcache directly, bdrb has a configuration
option, where you can specify if you want to use Memcache for worker
result storage.

–
Let them talk of their oriental summer climes of everlasting
conservatories; give me the privilege of making my own summer with my
own coals.

http://gnufied.org

Mikel_L · February 12, 2008, 4:53am

On Feb 12, 2008 2:31 PM, Philip H. [email protected] wrote:

I have a Rails App that is hitting a fairly big database (several
million rows of data).
How is anyone else handling this? Running it in backgroundrb doesn’t
seem to be an option because I am producing an interactive list, not a
static page / pdf or report.
Why couldn’t you pass it off to backgroundrb? It could then stuff the
results into a temporary table or memcache and you could look for a
finished result there. That would free up mongrel to do it’s thing as
you’d only be querying “are you done yet?” over and over till it was.

I had thought about using the temp table approach, it has some
benefits, #1 being it allows the user to get onto something else while
the list is generating… but I need a solution now, I think I’ll hit
that in a future version, good idea though.

I ended up putting the following in my balancer group:

<Proxy balancer://…>
BalancerMember http://127.0.0.1:4000 max=1 acquire=100
(repeat)

And that seems to have handled it, the Apache server skips over the
blocked Mongrel.

I’ll have a look at HA Proxy or Nginx per Lionel’s post on the next
performance iteration.

Question though Phillip, how would Memcache help in this situation of
long running SQL queries?

I can think with BackgroundRB on a temp table, you have an AJAX auto
requester on the page polling the mongrel that asks “are we done yet?”
and when the task is finished, pop it out. I guess you would get the
mongrel pack to poll a database table seeing if job XYZ is finished
yet and retrieve the temp table name to read from once the job is
finished and then send the data back to the client.

That actually sounds like a good solution now that I think of it. But
I don’t know enough about memcache to know how this would fit in.

Regards

Mikel

Mikel_L · February 14, 2008, 11:57pm

On Tue, Feb 12, 2008 at 8:46 PM, Piyush R. [email protected]
wrote:

One more way (I do it this way) would be to write a mongrel handler for that
particular request. That solves it as mongrel can handle multiple requests
simultaneously.

I have solved the multiple request (somewhat) by having apache skip
over busy mongrels, but this solution sounds interesting.

Any pointers on where to start on that? That sounds like a good gem

Mikel

Mikel_L · February 12, 2008, 10:46am

One more way (I do it this way) would be to write a mongrel handler for
that
particular request. That solves it as mongrel can handle multiple
requests
simultaneously.