Request_time much slower than upstream_response_time

I’ve noticed a periodic performance issue with our nginx 0.6.3.1 box
running on Fedora core 8.

We are primarily running nginx as the front end load balancer for a web
service we run, that serves mostly identical requests over and over to
different web clients.

We are logging both the request_time and upstream_response_time. With
the majority of requests these values log identical results. For
instance 0.011ms of request_time and the same 0.011ms for the
upstream_response_time. Although certainly this times fluctuate with
each request (ranging from say 0.005ms to 0.020ms) whatever the ms is is
typically logged as the same for both parameters.

However we have some requests where the request_time is MUCH higher.
For instance request_time will be logged at like 0.635ms with an
upstream_response_time still of only 0.005ms.

Any ideas on what can be causing this?

A few supporting notes that I hope will be helpful in your assessment:

  • the server is not under load. for instance at the most it is handling
    about 3 req/sec and I’ve seen this disparity between request_time and
    upstream_response_time even when it was handling like 1 request every
    few seconds

  • the server is set up to handle both SSL requests and non-SSL requests.
    I have seen this issue with both SSL and non-SSL, although it is a LOT
    more prominent with the SSL requests. For instance out of 2000 cases
    where request_time was much longer, about 80% of those were SSL
    requests. That being said, I still see this happen on some non-SSL
    requests.

  • normally I see a slight disparity with ssl pages. For instance
    normally for ssl I’ll see a request_time of 0.076ms with an upstream
    time of 0.012ms. I expect this and am NOT counting this as the basis for
    this performance issue I am reporting.

  • i have a rather simplistic configuration file. It has maybe 10
    location blocks at the most. Some handle nginx serving static content,
    but the most frequent logic path and the one involved with this
    performance disparity simply uses proxy_pass to send the request to an
    upstream server.

  • I track stub_status over time - it shows reading and writing
    typically around 1, and waiting typically around 40-60.

  • My SSL config looks like this (in case this has anything to do with
    it, but I doubt it since it happens on non-ssl requests too even when
    the server is near idle:

      ssl                  on;
      ssl_certificate      mydomain.com.crt;
      ssl_certificate_key  mydomain.com.key;
    
      ssl_session_cache       builtin:20480;
      ssl_session_timeout     3m;
    
      ssl_protocols  SSLv2 SSLv3 TLSv1;
      ssl_ciphers 
    

ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP;
ssl_prefer_server_ciphers on;

  • I have worker_connections 4096; and 1 worker process and
    keepalive_timeout 30;

In summary, I definitely think something is askew somewhere since the
problem tends to happen intermittantly and not under load.

I would greatly appreciate some tips on what you think may be causing
this and how I can go about troubleshooting this further. Also please
let me know what additional technical details I can collect to help with
assessing this performance matter. Thank you so very much!!!

Some quick additional information I wanted to add on this issue:

  • As I continue to research this more I am seeing even greater
    disparities than I originally reported. For instance in some cases the
    upstream response was just 0.011 sec while the request_time was a
    whopping 1.4 sec.

  • This more I look at this the more the problem seems specific to SSL
    connections. I am using a wildcard cert in case that should make any
    difference. Just a guess but maybe there is a blocking condition like
    only one SSL connection can be negotiated at a time across all incoming
    requests? I’m only handling an average of 3 req/sec so its hard to
    imagine the load itself is playing a factor, but perhaps some concurrent
    issue here?

  • In my last message I noted the times like 0.015 ms - I did not mean ms
    those times are actually in seconds (from the nginx log) not ms.

  • As a side note, we are not really passing sensitive information that
    has to be protected by high end SSL. However since we are called from
    within our user’s web sites that use SSL, we must answer in SSL if
    called with https://www.oursite.com because otherwise they would get a
    mixed content security warning. But if there is a faster ssl algorithm
    we just need something quick and dirty to make it a valid ssl
    connection.

Thanks!

On Sat, Jun 21, 2008 at 06:36:15PM -0700, Rt Ibmer wrote:

Some quick additional information I wanted to add on this issue:

  • As I continue to research this more I am seeing even greater disparities than I originally reported. For instance in some cases the upstream response was just 0.011 sec while the request_time was a whopping 1.4 sec.

  • This more I look at this the more the problem seems specific to SSL connections. I am using a wildcard cert in case that should make any difference. Just a guess but maybe there is a blocking condition like only one SSL connection can be negotiated at a time across all incoming requests? I’m only handling an average of 3 req/sec so its hard to imagine the load itself is playing a factor, but perhaps some concurrent issue here?

  • In my last message I noted the times like 0.015 ms - I did not mean ms those times are actually in seconds (from the nginx log) not ms.

  • As a side note, we are not really passing sensitive information that has to be protected by high end SSL. However since we are called from within our user’s web sites that use SSL, we must answer in SSL if called with https://www.oursite.com because otherwise they would get a mixed content security warning. But if there is a faster ssl algorithm we just need something quick and dirty to make it a valid ssl connection.

I suspect two causes of the tardiness:

  1. It may be caused by SSL handshake: there is 4 packets exchange, and
    while
    3 packets TCP handshake is not accounted in $request_time, the SSL
    handshake is accounted.

  2. It may be caused by SSL builtin session cache cleanups. Try to use
    shared only cache (even if you use the single worker):

  • ssl_session_cache builtin:20480;
  • ssl_session_cache shared:10m;

On Sun, Jun 22, 2008 at 11:36:44AM +0400, Igor S. wrote:

  • As a side note, we are not really passing sensitive information that has to be protected by high end SSL. However since we are called from within our user’s web sites that use SSL, we must answer in SSL if called with https://www.oursite.com because otherwise they would get a mixed content security warning. But if there is a faster ssl algorithm we just need something quick and dirty to make it a valid ssl connection.
  • ssl_session_cache builtin:20480;
  • ssl_session_cache shared:10m;
  • ssl_session_cache shared:10m;
  • ssl_session_cache shared:SSL:10m;

shared only cache (even if you use the single worker):
Thanks Igor. I tried changing the cache cleanups as you suggested. It
may be helpful for avoiding any blocking on other requests so thanks!
However it did not help speed the slower ssl connections.

If it is being caused by #1 above, am I correct to assume then that the
quality of the user’s connection to the internet could be playing a
factor in this? For instance if they are on a poor dialup connection?

What is strange is that in some cases the request_time and upstream_time
are the same, say about 0.015 seconds. Then in many cases there is what
I would consider an “understandable” difference due to ssl handshake
overhead like request_time of 0.3 - 0.5 sec with upstream time of 0.012
sec.

However we still have a fair number of cases where the request_time is
between 1-3 seconds (maybe like 1 out of every 50 ssl requests) while
the upstream time is still very low like 0.015 sec.

Basically our nginx server is hit in the following way: a user embeds a
tag in their web page that points to our nginx box. Our nginx front end
then goes to an upstream server (than nginx proxies) to get about 10KB
of static content, which nginx then returns to them.

The content we are serving is not in any way sensitive information.
However if the user’s web page is running under https and they try to
pull that small content from us running http, then the browser will
complain. So, we support https transactions simply so that we can
integrate with an ssl page and keep the browser happy.

Also another interesting element here is that the browser will cache our
content and we are only referenced about 1 to 3 times within a web page.
So on one web page hit the browser may hit our nginx box 1-3 times for
static content on that page, but this same browser will not come back to
nginx at all again during their session. So the ssl need is very short
lived for that uses session.

Based on the fact that our ssl transmitted data is not actually
sensitive data (so we don’t care about the strength/quality of the
encryption) and that it is short lived for each user (1-3 initial hits
to us from a web page from the same user and then not again the rest of
their session), can you recommend some ideas for the SSL configuration
options I can use within nginx to try and keep things as fast and
efficient as possible (as far as what to use for ssl_timeouts and other
types of setings like this)?

Thank you so much!!

You should use keepalive.

I use this currently:
keepalive_timeout 30;

The reason I have it at 30 is because typically only 1-3 requests will
be made from a web site to hit us for static content. So if user A goes
to www.xyz.com and xyz.com has references to our web service, user A
will hit nginx 1-3 times on the first page visit to xyz.com (which we
then send the content with expires 1 day) and then not again at all
unless on a subsequent day.

So my thought is that all 1-3 requests from user A will come to nginx
within 30 seconds, and will not come back in subsequent page hits so I
was thinking no need to keep that ssl connection open for more than 30
seconds.

However, perhaps I am misunderstanding this. Does having a higher keep
alive help users B, C, and D and so forth also on their first 1-3
requests? Or the keep alived connection only good for each user?

At any rate what would you recommend I set this at based on my usage
scenario above?

You may also try to use 56-bit and 128-bit ciphers first:

ssl_ciphers
DES-CBC-SHA:RC4-MD5:RC4-SHA:AES128-SHA:DES-CBC3-SHA;

Currently I use this:

ssl_ciphers DES-CBC-SHA:RC4-MD5:RC4-SHA:AES128-SHA:DES-CBC3-SHA;

So should I replace that line with what you put above or just preappend
those settings to my existing line?

Thank you so much!!

On Mon, Jun 23, 2008 at 07:40:21AM -0700, Rt Ibmer wrote:

shared only cache (even if you use the single worker):

Thanks Igor. I tried changing the cache cleanups as you suggested. It may be helpful for avoiding any blocking on other requests so thanks! However it did not help speed the slower ssl connections.

If it is being caused by #1 above, am I correct to assume then that the quality of the user’s connection to the internet could be playing a factor in this? For instance if they are on a poor dialup connection?

What is strange is that in some cases the request_time and upstream_time are the same, say about 0.015 seconds. Then in many cases there is what I would consider an “understandable” difference due to ssl handshake overhead like request_time of 0.3 - 0.5 sec with upstream time of 0.012 sec.

However we still have a fair number of cases where the request_time is between 1-3 seconds (maybe like 1 out of every 50 ssl requests) while the upstream time is still very low like 0.015 sec.

Basically our nginx server is hit in the following way: a user embeds a tag in their web page that points to our nginx box. Our nginx front end then goes to an upstream server (than nginx proxies) to get about 10KB of static content, which nginx then returns to them.

Yes, user’s connection speed is important here. $request_time is time
between
getting first byte of request and sending last byte of response to a
kernel.
As a request usually goes in one packets this means that nginx gets the
request
immediately. And if response is small (as your 10K) is mean that nginx
passes it into kernel immediately. So for small plain HTTP responses
$request_time does not show time of sending data to client.

In SSL case $request_time includes 4 packets exchange, i.e,
4 round trip times (RTT) plus possible TCP retransmits, etc.

The content we are serving is not in any way sensitive information. However if the user’s web page is running under https and they try to pull that small content from us running http, then the browser will complain. So, we support https transactions simply so that we can integrate with an ssl page and keep the browser happy.

Also another interesting element here is that the browser will cache our content and we are only referenced about 1 to 3 times within a web page. So on one web page hit the browser may hit our nginx box 1-3 times for static content on that page, but this same browser will not come back to nginx at all again during their session. So the ssl need is very short lived for that uses session.

Based on the fact that our ssl transmitted data is not actually sensitive data (so we don’t care about the strength/quality of the encryption) and that it is short lived for each user (1-3 initial hits to us from a web page from the same user and then not again the rest of their session), can you recommend some ideas for the SSL configuration options I can use within nginx to try and keep things as fast and efficient as possible (as far as what to use for ssl_timeouts and other types of setings like this)?

You should use keepalive. Small ssl_timeouts should not help in your
case.

You may also try to use 56-bit and 128-bit ciphers first:

ssl_ciphers DES-CBC-SHA:RC4-MD5:RC4-SHA:AES128-SHA:DES-CBC3-SHA;

On Mon, Jun 23, 2008 at 09:30:03AM -0700, Rt Ibmer wrote:

You should use keepalive.

I use this currently:
keepalive_timeout 30;

The reason I have it at 30 is because typically only 1-3 requests will be made from a web site to hit us for static content. So if user A goes to www.xyz.com and xyz.com has references to our web service, user A will hit nginx 1-3 times on the first page visit to xyz.com (which we then send the content with expires 1 day) and then not again at all unless on a subsequent day.

So my thought is that all 1-3 requests from user A will come to nginx within 30 seconds, and will not come back in subsequent page hits so I was thinking no need to keep that ssl connection open for more than 30 seconds.

However, perhaps I am misunderstanding this. Does having a higher keep alive help users B, C, and D and so forth also on their first 1-3 requests? Or the keep alived connection only good for each user?

No, keepalive connection is per user.

At any rate what would you recommend I set this at based on my usage scenario above?

30 seconds is enough.

So should I replace that line with what you put above or just preappend those settings to my existing line?
You already have the same ciphers.

On Mon, Jun 23, 2008 at 05:57:04PM -0700, Rt Ibmer wrote:

Sorry I pasted in the wrong line.

ssl_ciphers DES-CBC-SHA:RC4-MD5:RC4-SHA:AES128-SHA:DES-CBC3-SHA:ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP;

You should replace.

And I assume I should leave this prefer_server_ciphers as is:
ssl_prefer_server_ciphers on;

You may leave it.

Lastly, can you think of anything else we have not covered that can help speed the ssl handshake process?

No, I do not see anything. The main problem is RTT of SSL handshake.

You may also try to use 56-bit and 128-bit
ciphers first:

ssl_ciphers DES-CBC-SHA:RC4-MD5:RC4-SHA:AES128-SHA:DES-CBC3-SHA;

So should I replace that line with what you put above
or just preappend those settings to my existing line?

You already have the same ciphers.

Sorry I pasted in the wrong line.

What I meant to say is that currently I have this:

ssl_ciphers
ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP;

So should I just replace that with this:

ssl_ciphers DES-CBC-SHA:RC4-MD5:RC4-SHA:AES128-SHA:DES-CBC3-SHA;

Or should I append yours in the front of my existing one, to come up
with this:

ssl_ciphers
DES-CBC-SHA:RC4-MD5:RC4-SHA:AES128-SHA:DES-CBC3-SHA:ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP;

And I assume I should leave this prefer_server_ciphers as is:
ssl_prefer_server_ciphers on;

Lastly, can you think of anything else we have not covered that can help
speed the ssl handshake process?

Thanks again.