Tomcat cluster behind nginx: avoiding delays while restarting tomcat

We run a tomcat cluster behind nginx with the upstream module. Nginx
fits our requirements well for load balancing and failover, except for
in one case.

When starting or restarting tomcat, our web application takes a couple
of minutes to initialize, during which time the tomcat connector is
listening on TCP 8080, but the application isn’t ready to process
requests.

The nginx documentation instructs that a host (in planned down time)
should be marked as ‘down’ during this time, and this is a partial
solution to our problem.

Since we’re still small, our developers do the application deployment
themselves. The deploy process is quite informal and is performed
manually right now.

Because our developers are primarily Windows users who spend most of
their time in Eclipse, and because they don’t have a full understanding
of the systems, they tend to make mistakes when editing config files in
UNIX and when restarting/reloading servers. Because of this, I would
like to find the best solution for automating the deploy process,
beginning with this small part.

If the tomcat connector could be told not to start listening on its TCP
port until the app is finished initializing, then I would be tempted to
let the upstream module’s failover mechanism take care of everything
(comments on the wiseness or stupidity of succumbing to this temptation
are welcome). However, I haven’t seen any way to accomplish this.

I also don’t see any mechanism in the upstream module to help with this,
and the upstream module doesn’t seem to consider a tomcat that is
accepting TCP connections but that isn’t answering requests to be
failed.

This leads me to think that the best way to automate web app deployment
is to either:

  • Write a script to edit nginx.conf, mark the tomcat node as ‘down’, and
    reload nginx;

  • Or, write a script to run on the tomcat server using iptables to
    REJECT connections to TCP 8080 until the app is finished initializing.

Either of these could be built into an automated deployment process that
would save manual labor and the associated human error.

I would appreciate hearing how others have solved this problem, whether
the above ideas are reasonable, and whether there is a standard solution
I haven’t heard of. If it seems useful, I’ll be happy to post details
about our solution once it is implemented and tested.

John

On Wed, May 27, 2009 at 12:59 AM, John M. [email protected] wrote:

When starting or restarting tomcat, our web application takes a couple of
minutes to initialize, during which time the tomcat connector is listening
on TCP 8080, but the application isn’t ready to process requests.

This is one example of “healthchecks” that would make nginx more
robust for doing load balancing. I’ve got it on my wishlist :slight_smile:

I think most people will say make a healthcheck script that updates an
nginx include config file with the list of upstreams and then HUP
nginx…

  • Write a script to edit nginx.conf, mark the tomcat node as ‘down’, and
    reload nginx;

I’d say make it an include. That way less stuff to go through :stuck_out_tongue: (see
above)

  • Or, write a script to run on the tomcat server using iptables to REJECT
    connections to TCP 8080 until the app is finished initializing.

This is an interesting approach, never thought about doing it on the
upstream itself. Kind of cool, actually. “Am I ready for requests?”

On May 27, John M. wrote:

problem.
small part.

would save manual labor and the associated human error.

I would appreciate hearing how others have solved this problem, whether the
above ideas are reasonable, and whether there is a standard solution I
haven’t heard of. If it seems useful, I’ll be happy to post details about
our solution once it is implemented and tested.

When you application is not ready, what http status code does it return?
If it returns say a 50x code, you can instruct nginx to treat that as a
failure.