Keepalive and workers

MichaSSSSSJaszczyk · July 7, 2008, 12:47pm

Hi,

I’m creating a website with Nginx and have some questions. When a user
comes with a request, my application server has to connect to many
other servers in order to create the response. Due to separation of
concerns pattern, application server and all other servers are
separate HTTP servers, each built with Nginx. Okay, so this is the
situation I’m in (hopefully my description was clear ), and now
come the questions:

Will I have any performance gain if I keep connections from app
server to other servers instead of opening/closing them with every
request from a user?
Can I set keepalive in Nginx to last forever?
If a connection is kept alive, is it handled by the same Nginx
worker when new requests come in?

Regards,

Mike

MichaSSSSSJaszczyk · July 7, 2008, 1:23pm

Hi Mike,
I didn’t completely understand your setup, but answers to some of your
questions:

If a connection is kept alive, is it handled by the same Nginx
worker when new requests come in?

Yes, the worker process that is elected to serve a keep-alive HTTP
connection will continue to serve all subsequent HTTP requests over that
same connection

Will I have any performance gain if I keep connections from app
server to other servers instead of opening/closing them with every
request from a user?

As Regards “App server to any other server” is concerned, in general,
when you are using keep-alive, you should consider how the web server
handles keep-alive connections. Some server models would tie down a
worker process or thread to a particular connection, and as long as that
connection is active (the connection in your case being keep-alive),
then that will tie down the server thread, so keep-alive works best if
there is a steady stream of web-traffic. If the traffic is sporadic (or
in bursts), you may want to consider using Connection:close

Regarding NGINX, if NGINX is at the server end, keep-alive or no
keep-alive will not make much performance difference to NGINX, because
NGINX worker processes handle requests “on-demand”, i.e. in an
asynchronous manner.

NGINX workers do not fork or create new threads upon receipt of a new
connection, they simply add the new connection (along with the requisite
timeout) to their connection set, and go back to their main select/poll
loop.

When interesting events happen on the connection (such as HTTP request
received, or POP/IMAP connection received), then NGINX workers process
the necessary work required for that connection, and go back to their
select() loops. So NGINX will not be affected very much by keep-alives
versus non-keepalives.

Your network, however is a separate issue. non-keepalive (default for
HTTP/1.0) implies that the TCP handshake be completed for each HTTP
request.

HTH
Mansoor

----- Original Message -----
From: “MichaÅ‚ Jaszczyk” [email protected]
To: [email protected]
Sent: Monday, July 7, 2008 4:00:05 PM GMT +05:30 Chennai, Kolkata,
Mumbai, New Delhi
Subject: keepalive and workers

Hi,

I’m creating a website with Nginx and have some questions. When a user
comes with a request, my application server has to connect to many
other servers in order to create the response. Due to separation of
concerns pattern, application server and all other servers are
separate HTTP servers, each built with Nginx. Okay, so this is the
situation I’m in (hopefully my description was clear ), and now
come the questions:

Will I have any performance gain if I keep connections from app
server to other servers instead of opening/closing them with every
request from a user?
Can I set keepalive in Nginx to last forever?
If a connection is kept alive, is it handled by the same Nginx
worker when new requests come in?

Regards,

Mike

MichaSSSSSJaszczyk · July 7, 2008, 2:34pm

Thanks for all the answers! I have some new questions though :).

You said that I could omit unnecessary TCP handshakes and that’s
exactly what I had in mind. Do you have any idea if this would be a
big performance gain? It’s going to happen in a 1Gb LAN.

Another issue is that my backend servers (i.e. the ones that my app
server connects to) will be Nginx servers with mod_wsgi. Therefore I’m
losing lots of asynchronousness, so I’d like to avoid a situation
where all my app servers connect to the same worker in the backend
server, because that would create a bottleneck.

Thanks for all the input,

Mike

2008/7/7 Mansoor P. [email protected]:

MichaSSSSSJaszczyk · July 7, 2008, 2:42pm

MichaÅ‚ Jaszczyk ha scritto:

Hi,

I’m creating a website with Nginx and have some questions. When a user
comes with a request, my application server has to connect to many
other servers in order to create the response. Due to separation of
concerns pattern, application server and all other servers are
separate HTTP servers, each built with Nginx.

So you have the main frontend server, with mod_proxy to N backend
servers, and then each backend connects to M other servers?

Okay, so this is the
situation I’m in (hopefully my description was clear ), and now
come the questions:

Will I have any performance gain if I keep connections from app
server to other servers instead of opening/closing them with every
request from a user?

I’m not sure to understand how do you handle the connections, but the
Nginx mod_proxy does not support persistent connections.

Can I set keepalive in Nginx to last forever?

If a connection is kept alive, is it handled by the same Nginx
worker when new requests come in?

Regards,

Mike

Manlio P.

MichaSSSSSJaszczyk · July 7, 2008, 3:23pm

2008/7/7 Manlio P. [email protected]:

So you have the main frontend server, with mod_proxy to N backend servers,
and then each backend connects to M other servers?

Nope, I have the following layout:

HAProxy LB
several app servers Nginx+mod_wsgi
several ‘backend’ servers Nginx+mod_wsgi
each app server needs to retrieve information from each backend in
order to render response

‘Backend’ in this context means that it provides some kind of
information (for example what ads to display where on a page). App
server in order to render the page needs to contact several ‘backends’
of this kind. The reason for such layout is that my company will
perhaps buy some software to do the ads stuff. So when I have these
‘backends’, I keep my layout modular.

And the question is: Can I keep a connection in the python app in the
app server to the ‘backend’ server in order to improve performance
(i.e. omit unnecessary tcp handshakes)? How will the connections from
all app servers to a particular ‘backend’ be spread between its
workers? I’m concerned that if connections from all app servers are
handled by one worker in the ‘backend’, than I have a bottleneck,
because backend uses mod_wsgi so it can’t do multiple request
simultaneously.

Hope this explains my situation.

Thanks for all the help and prompt response!

Mike

MichaSSSSSJaszczyk · July 7, 2008, 3:44pm

MichaÅ‚ Jaszczyk ha scritto:

[…]

Another issue is that my backend servers (i.e. the ones that my app
server connects to) will be Nginx servers with mod_wsgi.

I would really like to know how many people are using mod_wsgi :).

Therefore I’m
losing lots of asynchronousness, so I’d like to avoid a situation
where all my app servers connect to the same worker in the backend
server, because that would create a bottleneck.

Not sure to unserstand, but that’s not a problem, since each worker
handle multiple connections to backend.

The only problem is to find the right number of worker processes to use
for the Nginx instance with mod_wsgi, or how many Nginx instances to
use.

However if you are going to use more then one Nginx instance with
mod_wsgi on the same server, then I think its better to try Apache
mod_wsgi.

Using more instances complicates the handling of logs and the server
management, so you should do this only if really necessary, IMHO.

Thanks for all the input,

Mike

Manlio P.

MichaSSSSSJaszczyk · July 7, 2008, 3:44pm

MichaÅ‚ Jaszczyk ha scritto:

and then each backend connects to M other servers?

Nope, I have the following layout:

HAProxy LB

several app servers Nginx+mod_wsgi

several ‘backend’ servers Nginx+mod_wsgi

each app server needs to retrieve information from each backend in
order to render response

Ok.

‘Backend’ in this context means that it provides some kind of
information (for example what ads to display where on a page). App
server in order to render the page needs to contact several ‘backends’
of this kind. The reason for such layout is that my company will
perhaps buy some software to do the ads stuff. So when I have these
‘backends’, I keep my layout modular.

And the question is: Can I keep a connection in the python app in the
app server to the ‘backend’ server in order to improve performance
(i.e. omit unnecessary tcp handshakes)?

In theory, yes.
You can also use Unix (or Local) domain sockets.
They should be often twice as fast as TCP sockets, at least on BSD
derived implementations.

However there this an important difference: if a call to connect for an
Unix domain stream socket finds that the listening socket’s queue is
full, ECONNREFUSED is returned immediately.

(from Unix Network Programming, volume 1 third edition).

As far as I know, no TCP handshake is done.

How will the connections from
all app servers to a particular ‘backend’ be spread between its
workers?

It depends on OS scheduling.

I’m concerned that if connections from all app servers are
handled by one worker in the ‘backend’, than I have a bottleneck,
because backend uses mod_wsgi so it can’t do multiple request
simultaneously.

Again, it depends on OS scheduling.
There is no load balancing among Nginx workers.
In theory, you can have 4 worker processes, but only one is actually
scheduled by the operating system.

The solution is to use an asynchronous connection to the backend
servers.

See this example of simple HTTP proxy using curl and my experimental
asynchronous extensions for WSGI:
http://hg.mperillo.ath.cx/nginx/mod_wsgi/file/tip/examples/nginx-curl.py

There is also this alternate example:
http://hg.mperillo.ath.cx/nginx/mod_wsgi/file/tip/examples/nginx-poll-sleep.py

where instead of polling for curl file descriptors, it just suspends the
execution of the current request for 500 ms.
This in theory is less efficient, but it is more robust with the current
version!

NOTE: the asynchronous extensions are not stable. I’m planning to
remove them, and instead integrate greenlet inside mod_wsgi.
http://codespeak.net/py/dist/greenlet.html

Another solution is to retrieve the informations you need using
JavaScript and XMLHTTPRequest in asynchronous mode.

Hope this explains my situation.

Thanks for all the help and prompt response!

Manlio P.

MichaSSSSSJaszczyk · July 7, 2008, 5:10pm

| You said that I could omit unnecessary TCP handshakes and that’s
| exactly what I had in mind. Do you have any idea if this would be a
| big performance gain? It’s going to happen in a 1Gb LAN.
|

From my experience (fwiw), you shouldn’t worry about the tcp handshakes,
especially when all this chatter will take place on a lan. You should be
more worried about the average turn around times that you will get per
http request, when using keep-alive versus non-keepalive. (see below)

| Another issue is that my backend servers (i.e. the ones that my app
| server connects to) will be Nginx servers with mod_wsgi. Therefore I’m
| losing lots of asynchronousness, so I’d like to avoid a situation
| where all my app servers connect to the same worker in the backend
| server, because that would create a bottleneck.
|

Ok, so if mod_wsgi is what I think it is (allowing python code to run in
an nginx worker process), then the wsgi code is under no obligation to
relinquish the processor once its code starts executing. For example,
traditional NGINX module development forces a module developer to avoid
blocking network calls, and to set up event handlers for socket
read()/write()s.

On the other hand, if nginx worker process runs python code in its own
process context (think embedded interpreter), then until the python code
finishes executing, nginx worker process cannot interrupt it to go and
serve some other pending http request.

Therefore, in this case, you would really be better off using
non-keepalive connections. In that case, if the python code continues to
run synchronously, then you are practically guaranteed that some other
nginx worker process will take up your new http request.

However, you should also be worried about the serializiability of the
your http requests from the app server standpoint.

For instance, if you have requests { R1, R2, R3, … Rn } that must be
executed, where the input of one request depends on the output of the
other, then you anyway will have to wait till R1 finishes before you can
do R2. In this case, you are better off using keep-alives for that
subset of mutually dependent requests.

For requests which can be made even when the previous request has not
finished, then you can use non-keepalives.

Hope I didn’t confuse you – of course, I may be completely at a
tangent here, if mod_wsgi uses interprocess communication with nginx
worker process (with a separate process executing application code),
then you can win by using keepalives.

M

| Thanks for all the input,
|
| Mike
|

MichaSSSSSJaszczyk · July 7, 2008, 4:43pm

You said that I could omit unnecessary TCP handshakes and that’s
exactly what I had in mind. Do you have any idea if this would be a
big performance gain? It’s going to happen in a 1Gb LAN.

I think SSL exchanges keys not only in the beginning but every n seconds
or every n bytes.

Max