Event-driven handler

Hi,

I’m trying to develop a handler for Nginx. I want to do this:

(1) Request comes in to Nginx
(2) This is passed on to handler
(3) Handler sends request to other code (either a new thread or some
event-driven system) and tells Nginx to wait until output has been
generated
(4) Output is generated in ‘background’ whilst Nginx is handling other
requests
(5) Message is sent to Nginx that content has been generated through
some event
(e.g. with a semaphore)
(6) Nginx retrieves content and sends to client

I already have a handler that does 1,2, 4 (not in background though) and
6 -
it’s the 3, 4 (in background) and 5 that I don’t know how to do yet. My
current
implementation is a blocking one, which I don’t want.

Could anyone please give me a simple example of how to do this, or at
least tell
me which functions I need to use?

Thanks.

I’m trying to develop a handler for Nginx. I want to do this:

(1) Request comes in to Nginx
(2) This is passed on to handler
(3) Handler sends request to other code (either a new thread or some
event-driven system) and tells Nginx to wait until output has been generated
(4) Output is generated in ‘background’ whilst Nginx is handling other requests
(5) Message is sent to Nginx that content has been generated through some event
(e.g. with a semaphore)
(6) Nginx retrieves content and sends to client

I think what your describing is exactly what the upstream modules do.
You shouldn’t need to change Nginx.

Ryan Dahl <ry@…> writes:

I think what your describing is exactly what the upstream modules do.
You shouldn’t need to change Nginx.

Ryan, thanks for replying. You’re right to mention upstream modules, and
that is
indeed what they do. However, I’m not passing off the request via a
socket -
I’m running the code inside Nginx itself (it’s written in C).

I have the code running in a customized compiled Nginx binary, but
currently the
code is blocking (i.e. no other requests can be handled whilst my new
code is
being processed). I’m looking to create a non-blocking version of my
code,
either through spawning off threads and handling each request in an
isolated
thread, or through some event-driven process. I’ve not decided on that
part of
it yet, but I need to first get the code hooked into the event
architecture of
Nginx.

I think that the solution to what I’m doing will use some of the
functions that
the upstream handling module(s) use internally, I’m just not certain yet
which
ones - though I’m in the process of trying to work it out from looking
at the
source code.

architecture of Nginx.
It sounds like you should make it into its own process and
communicating with it via an upstream module. You’re not gaining
anything by compiling something like that into Nginx.

Ryan Dahl <ry@…> writes:

it yet, but I need to first get the code hooked into the event
architecture of Nginx.

It sounds like you should make it into its own process and
communicating with it via an upstream module. You’re not gaining
anything by compiling something like that into Nginx.

Maybe, but I’m not convinced that’s correct. You are gaining, because
you don’t
then have to deal with the overhead of opening and sending information
over
sockets. I’m writing my code in C exactly because it’s aimed towards
very
high-load servers, and I’d rather avoid any extra overhead if possible.

The architecture is there for it to work just fine within Nginx, I just
don’t
know how yet. Also, there’s a lot of code withing Nginx that I’d have
to
reproduce in order to run it in its own binary.

If ultimately my code ends up destabilizing Nginx, then I would move it
out into
its own process, but I want to try to get it working without doing that
first.

I have similar needs, and would also like to have my module install a
callback that gets executed once per second (like the “trigger”
callback for lighttpd modules).

I think src/event/ngx_event.h has some things that may be useful
(ngx_add_event, etc).

-dave

Hi Dave,

I have similar needs, and would also like to have my module install a
callback that gets executed once per second (like the “trigger”
callback for lighttpd modules).

If you want to have a callback that executes regular code, you could
start a new
thread during the startup process. One example might be to create a
post-configuration ‘init’ function, which is called after the
configurations
have been merged (in your ngx_http__module_ctx structure,
it would
be the second one on the list).

In the function that you call, spawn off a new thread, which itself has
some
kind of infinite loop. You’d want to have either some kind of sleep
call in the
function (e.g. every 1 sec as you suggest), or perhaps use a semaphore
so that
your code was only woken up when needed.

Although I’ve not written it yet, this is what I’ll probably be doing.

I’m not sure if this is what the ‘trigger’ callback does in Lighty,
because I’m
not familiar with the internal workings of it.

I think src/event/ngx_event.h has some things that may be useful
(ngx_add_event, etc).

Thanks. It’s on my list of places to investigate the code. :slight_smile:

I’ll let you know if I work out how to do it.

Cheers,

Marcus.

Marcus C. ha scritto:

Hi,

I’m trying to develop a handler for Nginx. I want to do this:

(1) Request comes in to Nginx
(2) This is passed on to handler
(3) Handler sends request to other code (either a new thread or some
event-driven system)

Be aware that Nginx does not support threads, and it is not thread safe.
Of course you can create and manage your threads, but that’s not easy.

What do you mean by “some event-driven system”?

and tells Nginx to wait until output has been generated

This is easy.
You just have to return NGX_DONE from the handler, and call
ngx_http_finalize_request when done.

You may be interested in my mod_wsgi code (ngx_http_wsgix_handler.c)
code for more details
http://hg.mperillo.ath.cx/nginx/mod_wsgi/file/tip/src/ngx_http_wsgi_handler.c

(4) Output is generated in ‘background’ whilst Nginx is handling other requests

This, again, may be easy (if you don’t use threads, of course).

(5) Message is sent to Nginx that content has been generated through some event
(e.g. with a semaphore)

No need for this, if you avoid threads.
And if you use threads, this is not that simple.

(6) Nginx retrieves content and sends to client

I already have a handler that does 1,2, 4 (not in background though) and 6 -
it’s the 3, 4 (in background) and 5 that I don’t know how to do yet. My current
implementation is a blocking one, which I don’t want.

Could anyone please give me a simple example of how to do this, or at least tell
me which functions I need to use?

If you can refactor you code, I suggest to execute the computation by
steps.

Set a timer in Nginx, say to 0.001 seconds, so that at each event loop
cycle your code is executed.

The code should execute some piece of computation quickly (storing state
in an Nginx request context), and return.

Of course if the codes involves some IO, this may be tricky.
But you can use the Nginx event module to receive IO notifications.

Thanks.

Regards Manlio P.

Be aware that Nginx does not support threads, and it is not thread safe.
Of course you can create and manage your threads, but that’s not easy.
Thanks, yes I am aware - and I’ll be handling the thread stuff myself.
What do you mean by “some event-driven system”?
I’m talking about how I’ll generate my content here - it’s irrelevant to
my question about Nginx.

and tells Nginx to wait until output has been generated

This is easy.
You just have to return NGX_DONE from the handler, and call
ngx_http_finalize_request when done.
This may be what I need to know. If I return NGX_DONE, is the request
the response then not automatically
sent to the client? In order to do that, you need to call
ngx_http_finalize_request. Is that right?

What about if I want to use filters after my content has been
generated? E.g. gzip. At the moment, my
code calls ngx_http_output_filter in my handler and returning the
response from that.

Should I instead do something like:

[my handler]

  • sends request to my output-generating code (which may be in its own
    new thread)
  • immediately returns NGX_DONE (which then sends no response to the
    client)

[my output-generating code]

  • generates output (e.g. in its own thread)
  • returns ngx_http_output_filter(…) (if wanting to continue using
    filters)
  • returns ngx_http_finalize_request(…) (if don’t want to use any
    filters)

Does this sound right, or have I mis-understood what you wrote?

You may be interested in my mod_wsgi code (ngx_http_wsgix_handler.c)
code for more details
http://hg.mperillo.ath.cx/nginx/mod_wsgi/file/tip/src/ngx_http_wsgi_handler.c

Thanks, I’ll check it out.

(4) Output is generated in ‘background’ whilst Nginx is handling
other requests

This, again, may be easy (if you don’t use threads, of course).

(5) Message is sent to Nginx that content has been generated through
some event
(e.g. with a semaphore)
If I’ve understood your comments above, then this notification actually
isn’t necessary -
you just call ngx_http_finalize_request (or ngx_http_output_filter?)
when you’re done.

I was thinking that a response was automatically sent when you returned
from your handler.

If you can refactor you code, I suggest to execute the computation by
steps.
That’s not really practical for what I’m doing - there’s no guarantee
that the code can be executed quickly.
Of course if the codes involves some IO, this may be tricky.
But you can use the Nginx event module to receive IO notifications.
Could you possibly give me an example?

Thanks,

Marcus.

Eugaia ha scritto:

[…]

You just have to return NGX_DONE from the handler, and call
ngx_http_finalize_request when done.
This may be what I need to know. If I return NGX_DONE, is the request
the response then not automatically
sent to the client?

Nginx handles response body in chunks.
Each time you call ngx_http_output_filter, your chain buffer is sent to
Nginx filters and may be sent to OS socket buffer.

In order to do that, you need to call
ngx_http_finalize_request. Is that right?

Calling nxg_http_finalize_request simply, as the name suggest, instruct
Nginx to finalize the current request.

What about if I want to use filters after my content has been
generated? E.g. gzip. At the moment, my
code calls ngx_http_output_filter in my handler and returning the
response from that.

See my previous response.
You need to call ngx_http_output_filter as usual, but there are some
things you have to check:

  1. If an error is returned, then you need to call
    ngx_http_finalize_request, with that the error code

  2. If NGX_AGAIN is returned, then you need to setup the code
    so that your handler is called again when Nginx know that the socket
    buffer is empty again.

    This is one of the most “complex” parts.
    Again, the source code from mod_wsgi may help
    (although I can not guarantee it is correct).

Should I instead do something like:

[my handler]

  • sends request to my output-generating code (which may be in its own
    new thread)

ok

  • immediately returns NGX_DONE (which then sends no response to the client)

ok

[my output-generating code]

  • generates output (e.g. in its own thread)

ok

  • returns ngx_http_output_filter(…) (if wanting to continue using
    filters)
  • returns ngx_http_finalize_request(…) (if don’t want to use any filters)

WARNING!

As I have said, Nginx is not thread safe.
You MUST not call ngx_http_output_filter, ngx_http_finalize_request,
or any other Nginx function (with some exceptions) from your thread.

[…]

(5) Message is sent to Nginx that content has been generated through
some event
(e.g. with a semaphore)
If I’ve understood your comments above, then this notification actually
isn’t necessary -
you just call ngx_http_finalize_request (or ngx_http_output_filter?)
when you’re done.

No, don’t do that if you use a separate thread.

I was thinking that a response was automatically sent when you returned
from your handler.

No.
When you call ngx_http_output_filter, content may be sent to the client.
But control must return to Nginx as soon as possible (Nginx uses the
so called cooperative multitasking).

If you can refactor you code, I suggest to execute the computation by
steps.
That’s not really practical for what I’m doing - there’s no guarantee
that the code can be executed quickly.

What do you need to do?

Of course if the codes involves some IO, this may be tricky.
But you can use the Nginx event module to receive IO notifications.
Could you possibly give me an example?

Again, see the code of mod_wsgi:
http://hg.mperillo.ath.cx/nginx/mod_wsgi/file/tip/src/ngx_wsgi.c

function State_register

As I have anticipated, this is rather complex.
You should also read Nginx source code to understand how things works.

Thanks,

Marcus.

Manlio

[handler] (as mentioned before)

  • passes off request to content generator
  • returns immediately to Nginx with NGX_DONE

[content generator]

  • generates content
  • in threadsafe fashion, puts the output content in a queue of responses to
    be returned

what is the content generator? are you writing it from scratch? if it
connects to external sockets and doesn’t use nginx’s event loop or is
blocking - it should be an external process and you should use an
upstream module. if you are writing from scratch (that is all of the
socket code too) you are possibly not on the wrong path (unlikely)

[in Nginx]

  • loop that checks every e.g. 0.001 secs to see if there are any responses

this is going to fail spectacularly.

(headers and/or content)

  • checks for responses in threadsafe fashion, and if there are, calls
    ngx_http_output_filter etc

You should be talking via pipes/sockets to your content generator and
getting callbacks on those via the event loop. And if you’re
talking to another thread via pipes already you might as well save the
trouble and move it into its own process and use the already written
mature upstream modules. Just compiling a bunch of random code into
Nginx is not going to make it fast. You either design it from the
ground up very carefully or use external processes.

What do you need to do?

I’m generating pages based on simple templates, but which could do all
manner of things, including getting data from databases. I don’t want to
use PHP/Python etc because the overhead would be very high and not very
practical for what I’m doing.

You should also read Nginx source code to understand how things
works.

Yep, been doing that. :slight_smile:

Ok, I understand that I shouldn’t call ngx_http_output_filter etc from
within my thread.

You mentioned setting up a loop that checked every 0.001 secs. How
about a solution like this:

[handler] (as mentioned before)

  • passes off request to content generator
  • returns immediately to Nginx with NGX_DONE

[content generator]

  • generates content
  • in threadsafe fashion, puts the output content in a queue of responses
    to be returned

[in Nginx]

  • loop that checks every e.g. 0.001 secs to see if there are any
    responses (headers and/or content)
  • checks for responses in threadsafe fashion, and if there are, calls
    ngx_http_output_filter etc

This last loop is integrated into the co-operative multi-tasking
architecture of Nginx, so we don’t need
to worry about calls to ngx_http_output_filter not being threadsafe.

Is it possible to set up such a loop in Nginx? How would I set it up?

Alternatively, I would prefer to have an event-driven system whereby I
wouldn’t need a loop to check whether
my responses were ready. Is it possible to trigger non-IO events in
Nginx, such that once triggered, Nginx
would handle them properly (even if they were called from within a
different thread)?

I’ve looked at your WSGI code a bit as well as the Nginx source. I will
be looking at it some more too.

Thanks,

Marcus.

[in Nginx]

  • loop that checks every e.g. 0.001 secs to see if there are any responses

this is going to fail spectacularly.

This was my gut instinct (it sounds a horrible idea) and is something I
wanted to avoid at all costs.
It was more of a desperation attempt to not put things in an external
process.

what is the content generator? are you writing it from scratch? if it
connects to external sockets and doesn’t use nginx’s event loop or is
blocking - it should be an external process and you should use an
upstream module. if you are writing from scratch (that is all of the
socket code too) you are possibly not on the wrong path (unlikely)

Yes, as my understanding of the internals of Nginx are increasing, I
think I agree with you
on this one. I am writing the content generator from scratch.

You should be talking via pipes/sockets to your content generator and
getting callbacks on those via the event loop. And if you’re
talking to another thread via pipes already you might as well save the
trouble and move it into its own process and use the already written
mature upstream modules. Just compiling a bunch of random code into
Nginx is not going to make it fast. You either design it from the
ground up very carefully or use external processes.

I think I’ve come round to the idea that you’re right on this one. I
think what I’m doing
will be too messy to try to get it working inside Nginx, even though I’d
prefer to do it that
way. I’m looking at both FastCGI and a customized format which I’d use
with the upstream
module as possible solutions.

Marcus C. ha scritto:

What do you need to do?

I’m generating pages based on simple templates, but which could do all
manner of things, including getting data from databases. I don’t want to
use PHP/Python etc because the overhead would be very high and not very
practical for what I’m doing.

This is an incorrect generalization!

If your content generation is computationally expensive (both CPU and IO
bound), then the overhead of the Python interpreter is usually much
smaller compared to the overhead of your computation; and you can always
code your computationally expensive routine as a C extension.

However, for what you want to do, Nginx is the wrong choice.
You should use something else like:

  1. CGI, if your computation is CPU bound, and you don’t need a
    persistent connection to the database
  2. Apache, if you want flexibility
  3. Erlang, if you want a complete environment for writing both CPU and
    IO bound scalable web applications
  • generates content
  • in threadsafe fashion, puts the output content in a queue of responses
    to be returned

[in Nginx]

  • loop that checks every e.g. 0.001 secs to see if there are any
    responses (headers and/or content)
  • checks for responses in threadsafe fashion, and if there are, calls
    ngx_http_output_filter etc

This is called polling, and is generally the wrong solution.

[…]

Alternatively, I would prefer to have an event-driven system whereby I
wouldn’t need a loop to check whether
my responses were ready.

Note that you can’t have a second “loop” in Nginx (unless this is in a
separate thread).

Is it possible to trigger non-IO events in
Nginx, such that once triggered, Nginx
would handle them properly (even if they were called from within a
different thread)?

The Nginx event loop is event based.
It uses epoll or kqueue.

With kqueue or epoll it is possible to receive notification for non IO
events, like signals or filesystem events.

But in your case, you can use a simple pipe.
In the mailing list archive you can find an old discussione between me
and Igor, about this topic.

[…]

Manlio

However, for what you want to do, Nginx is the wrong choice.
You should use something else like:

  1. CGI, if your computation is CPU bound, and you don’t need a
    persistent connection to the database
  2. Apache, if you want flexibility
  3. Erlang, if you want a complete environment for writing both CPU and
    IO bound scalable web applications
    I’m looking into using libevent’s HTTP interface. It will be much easier
    to plug my threaded solution into that.
    This is called polling, and is generally the wrong solution.
    Yes, and I agree. :wink:
    With kqueue or epoll it is possible to receive notification for non IO
    events, like signals or filesystem events.
    I’m not very fimiliar with these yet - though I’m doing reading on it
    now (I’ll probably use libevent’s functions).
    But in your case, you can use a simple pipe.
    In the mailing list archive you can find an old discussione between me
    and Igor, about this topic.

I’ll check these out.

Thanks for your help.