Sharing a mongrel cluster in a multi-url environment


#1

I have a 2.2.2 ROR website package that I sell to my clients. After
deploying a few instances, I realized that the only difference
between the accounts was the client’s URL and the database. It seemed
wasteful to have many copies of the same software on my web server,
not to mention a separate mongrel cluster for each client, most of
which were running at very low load levels.

So I had the idea of combining all these separate websites into one
‘shared’ system. Having no idea how to do that, I had to wing it.
Here’s what I’m using:

  1. Apache as the front end server, running mod_proxy
  2. A single mongrel cluster (4 mongrels)
  3. A single instance of my Rails application

I had each Apache virtual host configuration insert a unique
client_id header in all requests, like so:

RequestHeader set X_CLIENT_ID ‘7563TY7732UUW9’ # a unique id for
each domain

The Rails app then reads this header and knows which client database
to use.

Thus, a request comes in for the url example1.com, Apache adds the
unique X_CLIENT_ID header for that URL, the request gets forwarded
through Mongrel, Rails reads the client_id, connects to the correct
database, and then renders the page as it would if this were not a
multi-domain system.

This all works very well except for one serious problem: mongrel
seems to mess up cookie handling in a multi-domain situation.

The problem is illustrated in the log snippet at the bottom of this
email:

The 1st request is from User 1, already logged in and working on
example1.com.

The 2nd request is from User 2, for the example2.com home page.

The 3rd request is from User 1, for another page in the application.
The HTTP request included the original cookie, but note that the
cookie value, the session id that Rails sees, has changed!

The 4th request shows User 1 being redirected to the login page,
because to Rails it looks like User 1 is a new, un-authenticated
user, who does not have the rights to access that page.

If no one ever accesses example2.com, then the users of example1.com
have no problem. That is, their cookies are not changed. However, as
soon as anyone hits example2.com, anyone subsequently hitting
example1.com will have their session_id cookie changed. The same is
true in the other direction… hits on example1 also change the
cookies of users on example2.

I can solve the problem by not sharing mongrels. That is, by keeping
everything else the same, but dedicating one or more mongrel
instances to each URL. As soon as mongrel stops handling multiple
URLs, the cookie problem vanishes.

However, this gets me back to having one or more mongrels for each
domain, which seems wasteful. A single mongrel cluster of 4 or 6 can
easily handle the load, if it weren’t for this cookie problem.

Can anyone explain this problem to me? I don’t actually understand
why mongrel doesn’t just pass the cookie in the request straight
through to Rails. Why does it change it?

And can anyone suggest a work-around?

Or perhaps I’ve got the wrong end of the stick, and it’s not
Mongrel’s fault? Or maybe I should take an entirely different
approach to this multi-domain problem? As I said, this is straight
out of my head. I have not been able to google any other approach to
what I’m trying to do here.

Any help, much appreciated. Again, this is a fully upgraded Rails
2.2.2 system.

– John

 **1st Request from USER 1 for example1.com**
     Processing Admin::CmsController#index (for 75.127.142.66 at

2009-01-27 13:15:27) [GET]
Session ID: 00b9cfb6fd397e5c9934ea58eaef648d
>>> Request for client 90873721, EXAMPLE1.COM
Rendering template within layouts/admin/standard
Rendering admin/cms/list
Completed in 114ms (View: 14, DB: 81) | 200 OK [https://
example1.com/admin/cms]

 **2nd Request from User 2 for example2.com**
     Processing CmsController#cms_show (for 64.1.215.163 at

2009-01-27 13:16:15) [GET]
Session ID: 4fed1c59001f7484a63fb6280376825a
Parameters: {“alias”=>“home.html”}
>>> Request for client 48218343, EXAMPLE2.COM
### alias: home.html
Rendering template within layouts/two-column
Rendering cms/cms_show
Completed in 23ms (View: 13, DB: 3) | 200 OK [http://
example2.com/]

 **3rd Request from User 1 for example1.com -- note session ID

changes!!!**
Processing Admin::CmsController#index (for 75.127.142.66 at
2009-01-27 13:16:18) [GET]
Session ID: 85c178aa70ed2bef6a767e844bf6c6d6
>>> Request for client 90873721, EXAMPLE1.COM
####### ‘admin/cms’, ‘index’
Redirected to actionsignincontroller/admin/user
Filter chain halted as [:check_authentication]
rendered_or_redirected.
Completed in 4ms | 302 Found [https://example1.com/admin/cms]

 **4th request -- redirected from 3rd request**
     Processing Admin::UserController#signin (for 75.127.142.66

at 2009-01-27 13:16:18) [GET]
Session ID: 85c178aa70ed2bef6a767e844bf6c6d6
>>> Request for client 90873721, EXAMPLE1.COM
Rendering template within layouts/admin/standard
Rendering admin/user/signin
Completed in 10ms (View: 6, DB: 0) | 200 OK [https://
example1.com/admin/user/signin]


#2

I don’t think this is mongrel’s fault, but it’s interesting nonetheless.

Could you have apache log the incoming cookie value to shed some more
light
on what happens? I believe that the cookie you see is not the cookie
that
the client sends, but a new cookie that Rails generates when it cannot
find
the cookie from the request in its current db.

My theory (aka wild guess) is that Rails picks up the cookie before you
change/set the database connection. Like this:

User1 requests the login page.
Rails connects to db1 and renders the page.
User1 logs in and is given a cookie, and session data is stored in db1.
User1 does something and all is good.
User2 requests the login page.
Rails connects to db2 and renders.
User1 requests a new page.
Rails is connected to db2 and fails to find the cookie.
Rails connects to db1 and redirects user.

Now, this isn’t exactly what’s happening in your logs, but add the fact
that
the different requests are being handled by different mongrels, each
with
their Rails stacks and db connections, and you have a fine mess.

Could you try running just one mongrel for a short while to minimise the
number of moving parts?

HTH,

/David


#3
  1. Apache as the front end server, running mod_proxy
  2. A single mongrel cluster (4 mongrels)
  3. A single instance of my Rails application

My question is, which one of the four mongrels is handling each of the
request? Can it be that your application isn’t thread-safe, and the same
mongrel instance is getting sent a new request before the previous
request completes?

I had each Apache virtual host configuration insert a unique
client_id header in all requests, like so:

RequestHeader set X_CLIENT_ID ‘7563TY7732UUW9’ # a unique id for
each domain

Also, just curious: was there a specific reason you added another layer
to the request routing with the use of X_CLIENT_ID, rather than just use
the value of “request.host” in Rails? It seems there’s a one-to-one
mapping between virtual hostnames and X_CLIENT_ID.

-Ripta


#4

On Jan 28, 2009, at 5:45 PM, David V. wrote:

you change/set the database connection. Like this:
Rails connects to db1 and redirects user.

Now, this isn’t exactly what’s happening in your logs, but add the
fact that the different requests are being handled by different
mongrels, each with their Rails stacks and db connections, and you
have a fine mess.

H’mmm. This is an interesting idea. I will do some testing to see if
this is the problem
.

Could you try running just one mongrel for a short while to
minimise the number of moving parts?

Yes, I tried that and it made no difference. In fact, the log trace I
sent was with one mongrel in the mongrel cluster.

I will get back with results of the db testing later today.

Thanks for the idea.

– John


#5

On Jan 29, 2009, at 9:32 AM, John A. wrote:

current db.
Rails connects to db2 and renders.
if this is the problem
Okay, here is what I did.

  1. I retested the configuration that used a separate mongrel instance
    (fired up from the command line) for each virtual host. No problem.
    [2 virtual hosts, 2 mongrels, 1 rails app, 2 databases]

  2. I connected the two Apache virtual hosts back to the
    mongrel_cluster (of 1 mongrel). The problem re-appeared. [2 virtual
    hosts, 1 mongrel cluster, 1 rails app, 2 databases]

  3. I changed the X_CLIENT_ID header in example2.com so it matched the
    client_id in example1.com. This means that both domains connected to
    the same database, without changing anything else. The problem
    disappeared. [2 virtual hosts, 1 mongrel cluster, 1 rails app, 1
    database]

This test seems to confirm that the problem is in the database
switching.

But scenario #1 also had the database switching and worked fine. This
makes me think that a mongrel instance has some sort of persistent
connection to the rails application, and the database that is ‘active’.

Is this true? Is the problem seen in scenario #2 caused by this
persistent connection being forced over to different URL and
different database?

– John


#6

On Jan 28, 2009, at 7:23 PM, Ripta Pasay wrote:

request completes?
No, sorry, I should have mentioned that I had already cut down the
number of mongrels in the cluster to 1. I would have, but in my mind
I had already ruled that out, so didn’t think to mention it.

layer
to the request routing with the use of X_CLIENT_ID, rather than
just use
the value of “request.host” in Rails? It seems there’s a one-to-one
mapping between virtual hostnames and X_CLIENT_ID.

Yes, I just didn’t realize that at the outset. This is the first time
I’m doing something like this, so I’m sort of making it up as I go
along.

– John


#7

I’m not sure that instance-specific connections are supported by
ActiveRecord (you can, however, use class-specific connections). From
the
ActiveRecord documentation,
http://api.rubyonrails.org/classes/ActiveRecord/Base.html, the database
connection information is tied to your model at the class level. I don’t
think you can accomplish what you have described without a separate
Mongrel
cluster for each virtual host.

Here’s an alternate idea:

Assuming you’re deploying to a *nix server, what you might try instead
is
having your rails app live in one location on the file system. Then,
since
the only thing that is different is the configuration, for each domain,
symlink in all but the config directory. The config directory will then
be
unique for each app, but all the code will be shared. When you update
the
application code, simply restart all the mongrel clusters.

Another idea: you may want to reconsider your application architecture.
Do
you really need a separate database for each host? Could you instead
introduce a ClientAccount class (that has the value from X_CLIENT_ID as
the
Primary Key), and tie it into the other models with has_many
relationships
and a before_filter on the ApplicationController? This way, you don’t
need
separate configs, you could point each VirtualHost at the same Mongrel
cluster, AND you could manage the Client Account within rails.

==
Will G.


#8

On Jan 29, 2009, at 10:17 PM, Will G. wrote:

cluster for each virtual host.
I’m hoping I can. What I’m going to try next is to use a common
database (the one defined in environment/production.rb) for the
sessions table. Then when I switch the database to the client’s db,
the session table won’t be affected.

That’s the theory, anyway.

then be
unique for each app, but all the code will be shared. When you
update the
application code, simply restart all the mongrel clusters.

Well, if I can get what I need working, I’ll only have one shared
mongrel cluster. That is the goal – to eliminate the need to have
dedicated mongrels for every client. I really don’t want to have
hundreds of mongrels cluttering up the system.

BTW, I don’t think it is a problem to have multiple mongrel clusters
on one directory. I haven’t tried this, but don’t see why it wouldn’t
work. Just use a different pid for each cluster.

separate configs, you could point each VirtualHost at the same Mongrel
cluster, AND you could manage the Client Account within rails.

I’ve thought about that, but each client potentially has thousands of
records in his/her database. I believe the system will get too slow
if all records are in one db, and all clients will be impacted by the
one client who has a million records.

Plus each and every table would have to have client_id in it. That
just smells bad to me. Much simpler to just give each client his own
db and switch on the request. That’s what I’m hoping for, anyway.

– John


#9

I’m suspicious of your idea that having each client have their own db is
going to be more efficient. Most rdbms are set up to support very large
tables, if you have the proper indexes created, but not neccesarily set
up to support thousands of seperate databases within an rdbms instance.

At least, AR is written assuming you are NOT going to do what you’re
trying to do, so you’re definitely going to be fighting AR to do this.

But this is really an AR question rather than a mongrel question. You
could try asking on the AR lists.

Jonathan

John A. wrote:

don’t
That’s the theory, anyway.
symlink in all but the config directory. The config directory will

Primary Key), and tie it into the other models with has_many
one client who has a million records.
http://rubyforge.org/mailman/listinfo/mongrel-users


Jonathan R.
Digital Services Software Engineer
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu


#10

For anyone who was interested in this thread, I discovered that the
session loss problem only occurred when:

  1. two sites were using the same mongrel (as would happen in a
    mongrel_cluster), and
  2. the app uses ActiveRecord storage for sessions
  3. the database containing the sessions table was switched

The key was to use a common database that contained the sessions
table (and client table). This common database is used when rails
first fires up. The application can then switch the other tables to
the client database, as selected by the domain name.

As long as you don’t switch the session table database, it works
fine, with no session loss. Having a single client table (that
contains the client domain name, and database parameters) is also
convenient.

BTW, there is a good discussion of this topic here: http://
wiki.rubyonrails.org/rails/pages/HowtoUseMultipleDatabases

That article pretty much gave me the solution. Thanks Dave T.!

– John

Websites and Marketing for On-line Collectible Dealers

IDENTRY, LLC
John A. - Managing Partner
(631) 546-5079
removed_email_address@domain.invalid
www.identry.com