Nginx Fastcgi_cache performance - Disk cached VS tmpfs cached VS serving static file

Hello guys,

First of all, thanks for nginx. It is very good and easy to setup. And
it is
kind of a joy to learn about it.

Two warnings: this performance thing is addictive. Every bit you
squeeze,
you want more. And English is my second language so pardon me for any
mistakes.

Anyways I am comparing nginx performance for wordpress websites in
different
scenarios and something seems weird. So I am here to share with you guys
and
maybe adjust my expectations.

Software

NGINX 1.4.2-1~dotdeb.1

PHP5-CGI 5.4.20-1~dotdeb.1

PHP-FPM 5.4.20-1~dotdeb.1

MYSQL Server 5.5.31+dfsg-0+wheezy1

MYSQL Tuner 1.2.0-1

APC opcode 3.1.13-1

This is a ec2 small instance. All tests done using SIEGE 40 concurrent
requests for 2 minutes. All tests done from localhost > localhost.

Scenario one - A url cached via fastcgi_cache to TMPFS (MEMORY)
SIEGE -c 40 -b -t120s
http://www.joaodedeus.com.br/quero-visitar/abadiania-go

Transactions: 1403 hits
Availability: 100.00 %
Elapsed time: 119.46 secs
Data transferred: 14.80 MB
Response time: 3.36 secs
Transaction rate: 11.74 trans/sec
Throughput: 0.12 MB/sec
Concurrency: 39.42
Successful transactions: 1403
Failed transactions: 0
Longest transaction: 4.43
Shortest transaction: 1.38

Scenario two - Same url cached via fastcgi_cache to disk (ec2 oninstance
storage - ephemeral)

Transactions: 1407 hits
Availability: 100.00 %
Elapsed time: 119.13 secs
Data transferred: 14.84 MB
Response time: 3.33 secs
Transaction rate: 11.81 trans/sec
Throughput: 0.12 MB/sec
Concurrency: 39.34
Successful transactions: 1407
Failed transactions: 0
Longest transaction: 4.40
Shortest transaction: 0.88

Here is where the first question pops in. I dont see a huge difference
on
ram to disk. Is that normal? I mean, no huge benefit on using ram cache.

Scenario three - The same page, saved as .html and server by nginx

Transactions: 1799 hits
Availability: 100.00 %
Elapsed time: 120.00 secs
Data transferred: 25.33 MB
Response time: 2.65 secs
Transaction rate: 14.99 trans/sec
Throughput: 0.21 MB/sec
Concurrency: 39.66
Successful transactions: 1799
Failed transactions: 0
Longest transaction: 5.21
Shortest transaction: 1.30

Here is the main question. This is a huge difference. I mean, AFAIK
serving
from cache is supposed to be as fast as serving a static .html file,
right?
I mean - nginx sees that there is a cache rule for location and sees
that
there is a cached version, serves it. Why so much difference?

Cache is working fine
35449 -
10835 HIT
1156 MISS
1074 BYPASS
100 EXPIRED

Any help is welcome.
Best regards.

Posted at Nginx Forum:

Hello!

On Thu, Oct 03, 2013 at 12:34:20PM -0400, ddutra wrote:

[…]

Successful transactions: 1799
Failed transactions: 0
Longest transaction: 5.21
Shortest transaction: 1.30

Here is the main question. This is a huge difference. I mean, AFAIK serving
from cache is supposed to be as fast as serving a static .html file, right?
I mean - nginx sees that there is a cache rule for location and sees that
there is a cached version, serves it. Why so much difference?

The 15 requests per second for a static file looks utterly slow,
and first of all you may want to find out what’s a limiting factor
in this case. This will likely help to answer the question “why
the difference”.

From what was previously reported here - communication with EC2
via external ip address may be very slow, and using 127.0.0.1
instead used to help.


Maxim D.
http://nginx.org/en/donation.html

Hello!

On Thu, Oct 03, 2013 at 03:00:51PM -0400, ddutra wrote:

instead used to help.

Alright, so you are saying my static html serving stats are bad, that means
the gap between serving static html from disk and serving cached version
(fastcgi_cache) from tmpfs is even bigger?

Yes. Numbers are very low. In a virtual machine on my notebook
numbers from siege with 151-byte static file looks like:

$ siege -c 40 -b -t120s http://127.0.0.1:8080/index.html

Lifting the server siege… done.
Transactions: 200685 hits
Availability: 100.00 %
Elapsed time: 119.82 secs
Data transferred: 28.90 MB
Response time: 0.02 secs
Transaction rate: 1674.88 trans/sec
Throughput: 0.24 MB/sec
Concurrency: 39.64
Successful transactions: 200685
Failed transactions: 0
Longest transaction: 0.08
Shortest transaction: 0.01

Which is still very low. Switching off verbose output in siege
config (which is there by default) results in:

$ siege -c 40 -b -t120s http://127.0.0.1:8080/index.html
** SIEGE 2.70
** Preparing 40 concurrent users for battle.
The server is now under siege…
Lifting the server siege… done.
Transactions: 523592 hits
Availability: 100.00 %
Elapsed time: 119.73 secs
Data transferred: 75.40 MB
Response time: 0.01 secs
Transaction rate: 4373.23 trans/sec
Throughput: 0.63 MB/sec
Concurrency: 39.80
Successful transactions: 523592
Failed transactions: 0
Longest transaction: 0.02
Shortest transaction: 0.01

That is, almost 3x speedup. This suggests the limiting factor
first tests is siege itself. And top suggests the test is CPU
bound (idle 0%) - with nginx using about 4% of the CPU, and about
60% accounted to siege threads. Rest is unaccounted, likely due
to number of threads siege uses.

With http_load results look like:

$ echo http://127.0.0.1:8080/index.html > z
$ http_load -parallel 40 -seconds 120 z
696950 fetches, 19 max parallel, 1.05239e+08 bytes, in 120 seconds
151 mean bytes/connection
5807.91 fetches/sec, 876995 bytes/sec
msecs/connect: 0.070619 mean, 7.608 max, 0 min
msecs/first-response: 0.807419 mean, 14.526 max, 0 min
HTTP response codes:
code 200 – 696950

That is, siege results certainly could be better. Test is again
CPU bound, with nginx using about 40% and http_load using about
60%.

From my previous experience, siege requires multiple dedicated
servers to run due to being CPU hungry.

[…]

Please let me know what you think.

Numbers are still very low, but the difference between public ip
and 127.0.0.1 seems minor. Limiting factor is something else.

Its my first nginx experience. So far it
is performing way better then my old setup, but I would like to get the most
out of it.

First of all, I would recommend you to make sure your are
benchmarking nginx, not your benchmarking tool.


Maxim D.
http://nginx.org/en/donation.html

Maxim D. Wrote:

nginx mailing list
[email protected]
nginx Info Page

Maxim,

Thanks for your help.

Alright, so you are saying my static html serving stats are bad, that
means
the gap between serving static html from disk and serving cached version
(fastcgi_cache) from tmpfs is even bigger?

Anyways, I did the same siege on a very basic (few lines) .html static
file.
Got better transaction rates.

siege -c 40 -b -t120s ‘http://127.0.0.1/index.html

Here are the results

Lifting the server siege… done.

 Transactions:                   35768 hits

Availability: 97.65 %
Elapsed time: 119.57 secs
Data transferred: 5.42 MB
Response time: 0.13 secs
Transaction rate: 299.14 trans/sec
Throughput: 0.05 MB/sec
Concurrency: 38.02
Successful transactions: 35768
Failed transactions: 859
Longest transaction: 1.41
Shortest transaction: 0.00

Obs: the small percentage of fails are because of “socket: 1464063744
address is unavailable.: Cannot assign requested address”. I think it is
a
problem on debian / siege config.

Same thing, using http://server-public-ip/index.html

Lifting the server siege… done.

 Transactions:                   32651 hits

Availability: 100.00 %
Elapsed time: 119.75 secs
Data transferred: 4.95 MB
Response time: 0.07 secs
Transaction rate: 272.66 trans/sec
Throughput: 0.04 MB/sec
Concurrency: 19.97
Successful transactions: 32651
Failed transactions: 0
Longest transaction: 0.56
Shortest transaction: 0.00

Note that this is a very basic html file, it just has a couple of lines.

Now the same thing with a more “complex” html which is a exact copy of
http://www.joaodedeus.com.br/quero-visitar/abadiania-go.

Using 127.0.0.1/test.html

Lifting the server siege… done.

 Transactions:                    2182 hits

Availability: 100.00 %
Elapsed time: 119.11 secs
Data transferred: 30.56 MB
Response time: 1.08 secs
Transaction rate: 18.32 trans/sec
Throughput: 0.26 MB/sec
Concurrency: 19.87
Successful transactions: 2182
Failed transactions: 0
Longest transaction: 2.68
Shortest transaction: 0.02

Using public ip

Lifting the server siege… done.

 Transactions:                    1913 hits

Availability: 100.00 %
Elapsed time: 119.80 secs
Data transferred: 26.79 MB
Response time: 1.25 secs
Transaction rate: 15.97 trans/sec
Throughput: 0.22 MB/sec
Concurrency: 19.94
Successful transactions: 1913
Failed transactions: 0
Longest transaction: 4.33
Shortest transaction: 0.19

Same slow transaction rate.

Please let me know what you think. Its my first nginx experience. So far
it
is performing way better then my old setup, but I would like to get the
most
out of it.

Posted at Nginx Forum:

Hello Maxim,
Thanks again for your considerations and help.

My first siege tests against the ec2 m1.small production server was done
using a Dell T410 with 4CPUS x 2.4 (Xeon E5620). It was after your
considerations about 127.0.0.1 why I did the siege from the same server
that
is running nginx (production).

The debian machine I am using for the tests has 4vcpus and runs nothing
else. Other virtual machines run on this server but nothing too heavy.
So I
am “sieging” from a server that has way more power then the one running
nginx. And I am sieging for a static html file on the production server
that
has 44.2kb.

Lets run the tests again. This time I’ll keep an eye on the siege cpu
usage
and overall server load using htop and vmware vsphere client.

siege -c40 -b -t120s -i ‘http://177.71.188.137/test.html’ (agaisnt
production)

Transactions: 2010 hits
Availability: 100.00 %
Elapsed time: 119.95 secs
Data transferred: 28.12 MB
Response time: 2.36 secs
Transaction rate: 16.76 trans/sec
Throughput: 0.23 MB/sec
Concurrency: 39.59
Successful transactions: 2010
Failed transactions: 0
Longest transaction: 5.81
Shortest transaction: 0.01

Siege cpu usage was like 1~~2% during the entire 120s.
On the other hand, ec2 m1.small (production nginx) was 100% the entire
time.
All nginx.

Again, with more concurrent users
siege -c80 -b -t120s -i ‘http://177.71.188.137/test.html

Lifting the server siege… done.

Transactions: 2029 hits
Availability: 100.00 %
Elapsed time: 119.65 secs
Data transferred: 28.41 MB
Response time: 4.60 secs
Transaction rate: 16.96 trans/sec
Throughput: 0.24 MB/sec
Concurrency: 78.00
Successful transactions: 2029
Failed transactions: 0
Longest transaction: 9.63
Shortest transaction: 0.19

Cant get pass the 17trans/sec per cpu.

This time siege cpu usage on my dell server was like 2~~3% the entire
time
(htop). vsphere graphs dont even show a change from idle.

So I think we can rule out the possibility of siege cpu limitation.


Now for another test. Running siege and nginx on the same machine, with
exactly the same nginx.conf as the production server - changing only one
thing:

worker_processes 1; to >> worker_processes 4; Because m1.small (AWS Ec2)
has
only 1 vcpu. My vmware dev server has 4.

siege -c40 -b -t120s -i ‘http://127.0.0.1/test.html

Results are - siege using about 1% cpu, all 4 vcpus jumping betweehn 30
to
90% usage, I would say avg 50%.

I dont see a lot of improvement either - results below:

Transactions: 13935 hits
Availability: 100.00 %
Elapsed time: 119.25 secs
Data transferred: 195.14 MB
Response time: 0.34 secs
Transaction rate: 116.86 trans/sec
Throughput: 1.64 MB/sec
Concurrency: 39.85
Successful transactions: 13935
Failed transactions: 0
Longest transaction: 1.06
Shortest transaction: 0.02

siege -c50 -b -t240s -i ‘http://127.0.0.1/test.html
Transactions: 27790 hits
Availability: 100.00 %
Elapsed time: 239.93 secs
Data transferred: 389.16 MB
Response time: 0.43 secs
Transaction rate: 115.83 trans/sec
Throughput: 1.62 MB/sec
Concurrency: 49.95
Successful transactions: 27790
Failed transactions: 0
Longest transaction: 1.78
Shortest transaction: 0.01

I belive this machine I just did this test is more powerful then our
notebooks. AVG CPU during the tests is 75%, 99% consumed by nginx. So it
can
only be something in nginx config file.

Here is my nginx.conf
http://ddutra.s3.amazonaws.com/nginx/nginx.conf

And here is the virtualhost file I am fetching this test.html page com,
it
is the default virtual host and the same one I use for status consoles
etc.
http://ddutra.s3.amazonaws.com/nginx/default

If you could please take a look. There is a huge difference between your
results and mine. I am sure i am doing something wrong here.

Best regards.

Posted at Nginx Forum:

Hello!

On Fri, Oct 04, 2013 at 09:43:05AM -0400, ddutra wrote:

am “sieging” from a server that has way more power then the one running
Availability: 100.00 %
Elapsed time: 119.95 secs
Data transferred: 28.12 MB
Response time: 2.36 secs
Transaction rate: 16.76 trans/sec
Throughput: 0.23 MB/sec
Concurrency: 39.59
Successful transactions: 2010
Failed transactions: 0
Longest transaction: 5.81
Shortest transaction: 0.01

If this was a 44k file, this likely means you have gzip filter
enabled, as 28.12M / 2010 hits == 14k.

Having gzip enabled might indeed result in relatively high CPU
usage, and may result in such numbers in CPU-constrained cases.

For static html files, consider using gzip_static, see
Module ngx_http_gzip_static_module. Also consider tuning
gzip_comp_level to a lower level if you’ve changed it from a
default (1).

And, BTW, I’ve also tried to grab you exact test file from the
above link, and it asks for a password. Please note that checking
passwords is expensive operation, and can be very expensive
depending on password hash algorithms you use. If you test
against password-protected file - it may be another source of
slowness.

Just for reference, here are results from my virtual machine, a
45k file, with gzip enabled:

Transactions: 107105 hits
Availability: 100.00 %
Elapsed time: 119.30 secs
Data transferred: 1254.22 MB
Response time: 0.04 secs
Transaction rate: 897.80 trans/sec
Throughput: 10.51 MB/sec
Concurrency: 39.91
Successful transactions: 107105
Failed transactions: 0
Longest transaction: 0.08
Shortest transaction: 0.01

Siege cpu usage was like 1~~2% during the entire 120s.

Please note that CPU percentage as printed for siege might be
incorrect and/or confusing for various reasons. Make sure to look
at idle time on a server.

On the other hand, ec2 m1.small (production nginx) was 100% the entire time.
All nginx.

Ok, so you are CPU-bound, which is good. And see above
for possible reasons.

[…]

If you could please take a look. There is a huge difference between your
results and mine. I am sure i am doing something wrong here.

The “gzip_comp_level 6;” in your config mostly explains things.
With gzip_compl_level set to 6 I get something about 450 r/s on my
notebook, which is a bit closer to your results. There is no need to
compress pages that hard - there is almost no difference in the
resulting document size, but there is huge difference in CPU time
required for compression.

Pagespeed also likely to consume lots of CPU power, and switching
it off should be helpfull.


Maxim D.
http://nginx.org/en/donation.html

Maxim,
Thank you again.

About my tests, FYI I had httpauth turned off for my tests.

I think you nailed the problem.

This is some new information for me.

So for production I have a standard website which is php being cached by
fastcgi cache. All static assets are served by nginx, so gzip_static
will do
the trick if I pre-compress them and it will save a bunch of cpu.
What about the cached .php page? Is there any way of saving the gziped
version to cache?

Another question - most static assets are being worked in some way by
ngx_pagespeed and the optimized assets are cached. That means .js, .css
and
images too. How does gzip works in this case? nginx gzips it everytime
it
gets hit? ngx_pagespeed caches gzipped content? I am confused.

Maybe it would be better to drop ngx_pagespeed, bulk optimize every
image on
source, minify all .js and .css, and let it all run on nginx without
ngx_pagespeed cache. Can you share you experience on that?

And one last question, is there any way to output $gzip_ratio on the
response headers in order to do a easy debbuging?

Later i’ll do some more sieging with gzip comp level at 1 and off and
i’ll
post it here.

Best regards.

Posted at Nginx Forum:

As promised here are my stats on vmware 4 vcpus siege -c50 -b -t240s -i
http://127.0.0.1/test.html
gzip off, pagespeed off.

Transactions: 898633 hits
Availability: 100.00 %
Elapsed time: 239.55 secs
Data transferred: 39087.92 MB
Response time: 0.01 secs
Transaction rate: 3751.34 trans/sec
Throughput: 163.17 MB/sec
Concurrency: 49.83
Successful transactions: 898633
Failed transactions: 0
Longest transaction: 0.03
Shortest transaction: 0.00

If you want to paste your considerations on my stackoverflow question

I’ll pick it as a correct anwser.

Now i’ll set up static gzip serving and use comp level 1 for dynamic
content.

Thanks alot. Best regards

Posted at Nginx Forum:

Well, I just looked at the results again and it seems my Throughput (mb
per
s) are not very far from yours.

My bad.

So results not that bad right? What do you think.

Best regards.

Posted at Nginx Forum:

Hello!

On Fri, Oct 04, 2013 at 12:52:28PM -0400, ddutra wrote:

fastcgi cache. All static assets are served by nginx, so gzip_static will do
the trick if I pre-compress them and it will save a bunch of cpu.
What about the cached .php page? Is there any way of saving the gziped
version to cache?

Yes, but it’s not something trivial to configure. Best aproach
would likely be to unconditionally return gzipped version to
cache, and use gunzip to uncompress it if needed, see
Module ngx_http_gunzip_module.

Another question - most static assets are being worked in some way by
ngx_pagespeed and the optimized assets are cached. That means .js, .css and
images too. How does gzip works in this case? nginx gzips it everytime it
gets hit? ngx_pagespeed caches gzipped content? I am confused.

I haven’t looked at what pagespeed folks did in their module, but
likely they don’t cache anything gzip-related and the response is
gzipped every time (much like with normal files). It might also
conflict with gzip_static, as pagespeed will likely won’t be able
to dig into gzipped response.

Maybe it would be better to drop ngx_pagespeed, bulk optimize every image on
source, minify all .js and .css, and let it all run on nginx without
ngx_pagespeed cache. Can you share you experience on that?

In my experience, any dynamic processing should be avoided to
maximize performance. Static files should be optimized (minified,
pre-gzipped) somewhere during deployment process, this allows to
achieve smallest resource sizes while maintaining best
performance.

And one last question, is there any way to output $gzip_ratio on the
response headers in order to do a easy debbuging?

No, as $gzip_ratio isn’t yet known while sending response headers.
Use logs instead. Or, if you just want to see how files are
compressed with different compression levels, just use gzip(1) for
tests.


Maxim D.
http://nginx.org/en/donation.html