Hello,
I’ve been playing around with writing an nginx module and trying to
configure it to run at high load (testing with the curl-loader tool). My
bench mark is the http-static-module, that is, I want to run at least as
much load on my module as the static module can without errors. I’m also
(for now) keeping the default number of worker processes (1) and worker
connections (1024), but more on that later.
Currently, using curl-loader, I can send requests/recv responses off to
the ngx_http_static_module at a rate of 5000-7000 requests per second
over a period of several minutes, all with 200 OK responses.
With my module, I usually manage to get up to about 1000 requests per
second with all 200 OK responses.
Then, pass that threshold, I start to see “Err Connection Time Out”
problems cropping up in the curl-loader log. Usually there will be long
blocks (maybe 20 or so) of them, then I’ll go back to 200
OK’s. The rates are still good, maybe 200 time outs out of 100,000
connections, but I’m wondering why they aren’t perfect like the
http-static-module.
The only real difference I can see between my module and the static
module is the time it takes to generate the response (I’ve set the test
up so that they return the same amount of data, ~5K, however my module
does do other memory allocations for processing).
I used gettimeofday to try and get microsecond resolution on the time it
takes to generate a response.
With the static module, I see about 20-50 microseconds on average to
generate a response. My module, which has to do more processing, takes
on average 60-260 microseconds to generate its response. The pattern
seems to start on the lower side, get larger, then go back to the lower
side, but this isnt’ exact. Note that in both cases though, I
occasionally see randomly high times (like 15000 microseconds), however,
this doens’t correspond to the number of timeouts I see in curl loader
(indeed, I get this even for that static module, which doens’t time out)
.
So I tried simply adding a delay with usleep into the static module, and
sure enough, I started seeing time out errors cropping up with the
static module. So it seems the number of time outs is (roughly)
proportional to the time it takes to generate the response.
But I’m still not clear on why nginx is sending time outs at all. That
is, if it takes longer to generate the response, shouldn’t it just take
longer to send to response? Is there a configurable value somewhere
that’s causing nginx to send a time out? What effect does the number of
worker processes and connections have? I have curl-loader set to have no
limit on completion time (which I believe is the default), so I don’t
think it’s what’s causing the time outs, but I’m not sure (there is
nothing in nginx’s error.log when I get a time out).
I can indeed increase the number or worker processes/connections to get
better throughput with my module, but it takes more dramatic increases
then I would expect. E.g. 40 processes and 4000 connections or so let me
run 1400 connections/second on my module without errors. This helped
bring the processing time down to about 60-140 microseconds. But it
seems there should be a better way to achieve this throughput without
using that many resources.
Any advice you might have would be helpful. One specific thing I’m
wondering is if I’m being too liberal with my use of ngx_palloc/calloc,
and that might be slowing things down? I.e. might explicit frees of the
memory when its done help? But any other ideas would be great too.
Thanks, and have a good day!
Posted at Nginx Forum: