Parallel subrequests for multi-source long polling?

shaun · November 18, 2009, 7:37am

Here’s a description of what I’m hoping to accomplish:
A request comes in and, based on the content of the GET arguments or
POST data, one or more backends will be selected. The request (with a
bit of modification) will then be proxied to all of the selected
backends. These backends might return data immediately or could hang
waiting for data for a indeterminate amount of time. I’d like to return
response from whichever backend responds first and then cancel or ignore
the responses from the rest of the pending requests. In the case of
errors (502’s, 500’s, whatever), the an error wouldn’t be returned
unless all the backends have failed.

I wrote module a while ago that handles the much simpler single backend
case using the built in upstream functionality, but that certainly isn’t
going to work for multiple requests.

As far as I can tell, I should be able to get most of what I need using
subrequests, but after a day of hacking around on it, I haven’t had much
luck getting anything working. Part of the problem is likely that I’m
trying to use them from a handler, rather than a filter, and capturing
the output instead of letting nginx just return it inline. I haven’t
seen any code that looks applicable to whatt I’m trying to do, so I have
no idea if my approach is appropriate or not.

Anyway, any suggestions are welcome. I’d like to do this using as much
of the higher level nginx pieces as possible, but if that isn’t an
option I can probably just dig in to the connection or event level code
and try to hook in there. That seems horribly ugly, though, and I’d
prefer to do it the right way if there is a right way.

Thanks!
Shaun Lindsay

Posted at Nginx Forum:

shaun · November 18, 2009, 8:30am

Did you look at the http://wiki.nginx.org/NginxHttpEchoModule? From the
description it looks like it’s something that you could use (or at least
use
as a base for your module).

Best regards,
Piotr S. < [email protected] >

shaun · November 18, 2009, 9:28am

On Wed, Nov 18, 2009 at 2:30 PM, shaun [email protected] wrote:

Here’s a description of what I’m hoping to accomplish:

Here’s a proposal for your consideration:

You need a content handler and an output filter to work together.
Use a content handler to issue all the subrequests (to all your
upstream backends). We assume the content handler lives in the main
request.
When starting each subrequest, the content handler feeds a
“post_subrequest” handle and the main request’s ctx object into the
ngx_http_subrequest call. Also, the content handler registers a ctx
object to the subrequest object to inform the subrequest’s output
filter to buffer the response.
In your output filter, first check if there’s a ctx object
associated with the current request. If yes, then you’re filtering one
of the subrequests that your content handler starts. So now simply
buffer the response chain link in this ctx object (in meomory or on
disk, it’s up to you).
In your “post_subrequest” handle, check the main request’s ctx
object (it should the the last argument passed in). We assume there’s
a “success” flag and a “failures” counter in that ctx object (both are
zeroed initially), as well as the total number of subrequests. If
“success” flag is not set and if the current subrequest succeeds,
simply return the buffered response contents in your subrequest’s ctx
object, and set a “success” flag in your main request’s ctx object. If
the current subrequest fails, then increment the “failures” counter.
If the “failures” counter reaches the total number of the subrequests
(which means all tests are failed), then returns the final failure
(and set the main request’s status code if appropriate and send
headers for the main request.)

Your main request, i.e., the content handler should not send headers
or any contents by itself and should return NGX_DONE.

I think this model should work after my various experiments in the
“echo” module, but I could be wrong Feel free to tell us your
findings on your way

Good luck!
-agentzh

shaun · November 18, 2009, 10:29am

agentzh Wrote:

upstream backends). We assume the content handler
filter to buffer the response.
main request’s ctx
object, and set a “success” flag in your main

Good luck!
-agentzh

This seems like a perfectly reasonable way to tackle the problem.

I was trying to do this without using an output filter, which, in
retrospect, doesn’t make much sense. Thanks for the advice! I’ll see
if I can get something working tomorrow.

Posted at Nginx Forum:

shaun · November 18, 2009, 9:31am

On Wed, Nov 18, 2009 at 4:24 PM, agentzh [email protected] wrote:

Your main request, i.e., the content handler should not send headers
or any contents by itself and should return NGX_DONE.

I’m not completely sure about this part for the latest nginx versions

= 0.8.21. See the regression report in this thread:
CPS-chained subrequests with I/O interceptions no longer work in nginx 0.8.21 (Was Re: Custom event + timer regressions caused by the new release) It may hang mysteriously and
may not.

I suggest you try 0.8.20 or 0.7.x >= 0.7.46 first. If it works, then
try out the latest 0.8.27 version.

Good luck!
-agentzh

shaun · November 19, 2009, 10:00am

agentzh Wrote:

(and set the main request’s status code if

my earlier replies
Thanks for making me thinking about this issue
very hard again

Best,
-agentzh

Well, I’ve got everything almost working. Not calling
ngx_http_finalize_request on the parent request cost me a couple of
hours of debugging mysterious hangs when there were multiple
subrequests, but after doing that the hang goes away and the output
looks good. So far, it works perfectly if all the requests return
immediately, if all the requests error out and if all the requests but
one error out. The only remaining issue I’m having is when there are 2
or more subrequests the main request does return until both subrequests
finish. In my test case, I have one subrequest return after 0.1 seconds
and the other after 10 seconds. It returns the output from the fast
request, as expected, but it returns it after 10 seconds. Very strange.

I’m guessing I need to do something special to cancel the pending
subrequests, or possibly force the completion of the parent request
regardless of the state of the rest of subrequests. I tried calling
ngx_http_finalize_request with various rc values on the pending
subrequests, hoping to find something that would force the completion,
but nothing seemed to work (I can get the parent request to close
immediately, but that doesn’t really help since it hasn’t sent the good
request’s response yet).

Is there a way to cancel pending subrequests or to keep the parent
request from being postponed? Either option would probably work.

Thanks for the help so far – I’ve made a lot of progress. If I can
sort out this last bit, I should have a fully working prototype
tomorrow. (and on our production machines by Friday!

–Shaun

Posted at Nginx Forum:

shaun · November 19, 2009, 10:59am

On Thu, Nov 19, 2009 at 4:54 PM, shaun [email protected] wrote:

Well, I’ve got everything almost working.

Yay!

Not calling ngx_http_finalize_request on the parent request cost me a couple of hours of debugging mysterious hangs when there were multiple subrequests, but after doing that the hang goes away and the output looks good.

Nice

So far, it works perfectly if all the requests return immediately, if all the requests error out and if all the requests but one error out. The only remaining issue I’m having is when there are 2 or more subrequests the main request does return until both subrequests finish. In my test case, I have one subrequest return after 0.1 seconds and the other after 10 seconds. It returns the output from the fast request, as expected, but it returns it after 10 seconds. Very strange.

Yes, I was aware of this and wanted to say you need to cancel the
pending subrequests when one subrequest succeeds.

It’s weird that the client sees the response header and body after the
slowest subrequest finishes. It seems that the response has been
buffered in the last few output filters somehow. Please ensure that
you have set b->flush and b->last_buf in your output chain link. These
flags should defeat buffering in most cases.

And still, we have to cancel the pending subrequests, but close the
connection seems a bit overkill especially in the context of HTTP
keepalive.

I’m guessing I need to do something special to cancel the pending subrequests, or possibly force the completion of the parent request regardless of the state of the rest of subrequests.

Indeed.

I tried calling ngx_http_finalize_request with various rc values on the pending subrequests, hoping to find something that would force the completion, but nothing seemed to work

What rc values have you tried? Can you publish your code?

(I can get the parent request to close immediately, but that doesn’t really help since it hasn’t sent the good request’s response yet).

Try forcibly flushing the buffered response headers and body on the
parent request

Cheers,
-agentzh

shaun · November 19, 2009, 5:30am

On Wed, Nov 18, 2009 at 4:24 PM, agentzh [email protected] wrote:

In your output filter, first check if there’s a ctx object
associated with the current request. If yes, then you’re filtering one
of the subrequests that your content handler starts. So now simply
buffer the response chain link in this ctx object (in meomory or on
disk, it’s up to you).

We should also abort the current (sub)request if the ctx shows
“success”, which means another brother subrequest has succeeded
already

If the “failures” counter reaches the total number of the subrequests
(which means all tests are failed), then returns the final failure
(and set the main request’s status code if appropriate and send
headers for the main request.)

Your main request, i.e., the content handler should not send headers
or any contents by itself and should return NGX_DONE.

I was a bit hand-waving here. To be more accurate:

In the “post_subrequest” handler, sends headers and contents directly
on “r->parent” rather than “r” because the latter references the
current subrequest while the former is our “main request”. Also never
forget to call ngx_http_finalize_request on “r->parent”, or you’ll
observe that “mysterious hang” for nginx 0.8.21+ mentioned in one of
my earlier replies

I think this model should work after my various experiments in the
“echo” module, but I could be wrong Feel free to tell us your
findings on your way

Now I believe it should work

Thanks for making me thinking about this issue very hard again

Best,
-agentzh

shaun · November 19, 2009, 12:33pm

On Thu, Nov 19, 2009 at 5:54 PM, agentzh [email protected] wrote:

It’s weird that the client sees the response header and body after the
slowest subrequest finishes. It seems that the response has been
buffered in the last few output filters somehow. Please ensure that
you have set b->flush and b->last_buf in your output chain link. These
flags should defeat buffering in most cases.

Okay, I was wrong here The output body of the current subrequest
and its parent request will be permanently buffered if the request
under question is not at the current head of the postponed chain. For
example

location /main {
    echo hello;
    echo_flush;
    echo_location_async '/foo';
    echo_location_async '/bar';
    echo_location_async '/baz';
    echo world;
    echo_flush;
}

location /foo {
    echo_sleep 1;
    echo foo;
    echo_flush;
}

location /bar {
    echo_sleep 2;
    echo bar;
    echo_flush;
}

location /baz {
    echo_sleep 1;
    echo baz;
    echo_flush;
}

Accessing /main using curl will show “hello” immediately, and “foo” 1
sec later, and finally “bar”, “baz”, and “world” together after
another 1 sec. So if the slowest subrequest is issued first, like the
location /foo here, then there’s little hope to get the outputs of
later subrequests like /baz properly flushed without using hacks.

The “world” output of the main request is buffered because it’s at the
end of the postponed chain while “hello” is at the head. When “hello”
gets out, “foo” becomes the head of the postponed chain.

And still, we have to cancel the pending subrequests, but close the
connection seems a bit overkill especially in the context of HTTP
keepalive.

Forcibly cancle subrequests can be dangerous because we have to ensure
all those timers and event handlers get properly cleared. And I have
never done such things myself. Sorry about that. chaoslawful and I
will take a closer look at this issue but with no promise.

I tried calling ngx_http_finalize_request with various rc values on the pending subrequests, hoping to find something that would force the completion, but nothing seemed to work

Yeah, it won’t work. I’ve tested within my “echo” module by
introducing a “echo_abort_parent” directive. I wonder if we’ll have to
arrange the postponed chain ourselves or bypass the postpone filter
completely. I’m not sure. It’s getting evil already

Maybe other people on the list can give some advice on canceling a
pending subrequest?

Cheers,
-agentzh

shaun · November 19, 2009, 9:29pm

safe or sane, I’ll give the second option a try.
If you want to go that route then I believe that you could re-use a lot
of
code from ngx_supervisord[1]. Most of the code below line 993 (“nginx <>
supervisord communication” comment) does exacly what you need (creating
“fake request”, connecting to upstream outside of “upstream {}” confing,
etc).

One thing you should know upfront is that I needed requests which would
be
totally independent from any real connections / requests
(ngx_supervisord
communicates with supervisord even when there are no requests - to shut
down
idle backends, for example), and that’s why there is a lot of code to
create
“fake” connections, events and configs in ngx_supervisord_init. I’m
pretty
sure you could get away with pointing most of them to data from original
request…

Also, responses from supervisord are quite small, so I’m forcing nginx
to
read them into memory. You might need to use temp files for bigger
things.

[1] FRiCKLE Labs / nginx / ngx_supervisord

Best regards,
Piotr S. < [email protected] >

shaun · November 19, 2009, 10:14pm

Piotr S. Wrote:

One thing you should know upfront is that I needed
to data from original
Piotr S. < [email protected] >
My original version of this module (long-poll with a single backend,
dynamically determined based on the incoming request), does something
very similar to what you’re doing. For the incoming request, it figures
out the appropriate backend then configures the request’s upstream and
kicks off the proxying. For the multi-backend version, I’ll likely need
to do something similar to what you’ve done (i.e. creating requests
completely separate from the actual, live request). I figure the
trickiest part will be building those requests and getting them in to
state that’s usable by nginx.

I haven’t entirely given up on the subrequests yet, though.

Thanks,
Shaun

Posted at Nginx Forum:

shaun · November 19, 2009, 11:43pm

I figure the trickiest part will be building those requests and getting
them in to state that’s usable by nginx.

This is already done, so there isn’t much to figure out

Just create “independent” request:
r = ngx_supervisord_init(pool, uscf);
and then you can inject it into regular nginx’s flow:
ngx_http_upstream_init®;

You just need to create your own u->peer.get(), u->peer.free(),
u->create_request() and u->finalize_request() functions.

That’s of course, if you give up on subrequests.

Best regards,
Piotr S. < [email protected] >

shaun · November 19, 2009, 8:29pm

agentzh Wrote:

flags should defeat buffering in most cases.
location /main {
echo_sleep 1;
location /baz {
issued first, like the
chain.
all those timers and event handlers get properly
completion, but nothing seemed to work

Maybe other people on the list can give some
advice on canceling a
pending subrequest?

Cheers,
-agentzh

My plan is to start messing with the postpone chain and see if I can
rearrange it in a way that won’t totally break.

So, as another option, I wonder if it’s possible to do this using
completely separate requests. I have no idea about the feasibility of
this, but it seems like it could work. I could create new requests
where I was using subrequests before and then configure the upstream
manually, attach a context and then collate the results in the output
filter. Everything would work nearly the same except the finalizing of
the main request would have to happen in the child request filter
instead of the subrequest’s postreq handler.

I’ll take a stab at hacking the postpone chains and if that doesn’t seem
safe or sane, I’ll give the second option a try.

Oh, I’ll also try to get the code up on github a bit later today as
well.

–Shaun

Posted at Nginx Forum:

shaun · November 20, 2009, 9:37am

Piotr S. Wrote:

flow:
Piotr S. < [email protected] >
Awesome, I got it fully working using this method. I gave up my battle
against the subrequests and switched over to creating new requests in
the same manner as your supervisord module. There’s a bit more
trickiness needed, since these new requests will need some of the pieces
of the original request (some of the original request’s configuration,
headers and server variables in my case).

I don’t have the headers working completely correctly yet, but
everything else seems to work perfectly.

I might be able to package this up as a generic subrequest replacement.
At the very least, I’ll clean up the code and post it somewhere for
everyone’s perusal.

Thanks a bunch to both of you (Piotr and Agentzh) for your help.

–Shaun

Posted at Nginx Forum:

shaun · November 20, 2009, 10:49am

Awesome, I got it fully working using this method.

Great! I’m glad to hear that

I don’t have the headers working completely correctly yet, but everything
else seems to work perfectly.

What headers do you mean? From the request or from the response? If you
mean
the former, then you should look into u->process_headers().

I might be able to package this up as a generic subrequest replacement.
At the very least, I’ll clean up the code and post it somewhere for
everyone’s perusal.

Actually, I was going to do this myself over the weekend, so you could
use
“clean library”… I thought that you’ll give subrequests some more time

Anyway, what do you think would be better form for such “library” (from
developer’s perspective):

API module which would expose “fake request” functions to other
modules
(but it would require --add-module=/path/to/api
and --add-module=/path/to/your/module)
or
simple .c & .h files to include in your module and distribution?

Best regards,
Piotr S. < [email protected] >

shaun · November 20, 2009, 11:28am

On Fri, Nov 20, 2009 at 5:42 PM, Piotr S. [email protected]
wrote:

I might be able to package this up as a generic subrequest replacement. At
the very least, I’ll clean up the code and post it somewhere for everyone’s
perusal.

Actually, I was going to do this myself over the weekend, so you could use
“clean library”… I thought that you’ll give subrequests some more time

I’m looking forward to that

Anyway, what do you think would be better form for such “library” (from
developer’s perspective):

API module which would expose “fake request” functions to other modules
(but it would require --add-module=/path/to/api and
–add-module=/path/to/your/module)
or

simple .c & .h files to include in your module and distribution?

I’m afraid the second method may give troubles given the current nginx
build system.

All the addon module’s .o files are put under a single directory,
namely, objs/addon/src/. If two modules use two different versions of
your ngx_blah_blah_blah.c, then one of the .o will get overridden and
break that module’s certain assumption. This will not be an issue,
however, if version numbers are coded into the .c file names, like
ngx_blah_blah_blah_v1.c

Personally I like the second method more because it reduces one
dependency on the end-user’s side

Cheers,
-agentzh

shaun · November 20, 2009, 12:47pm

agentzh Wrote:

ngx_blah_blah_blah_v1.c

Personally I like the second method more because
it reduces one
dependency on the end-user’s side

Cheers,
-agentzh

Distributing versioned .c/.h files seems the cleanest.

Posted at Nginx Forum:

shaun · November 22, 2009, 3:19am

shaun Wrote:

break that module’s certain assumption. This
dependency on the end-user’s side

Cheers,
-agentzh

Distributing versioned .c/.h files seems the
cleanest.

Piotr / Agentzh,

I moved all the subrequest specific stuff in to a separate module,
available here:
git://github.com/srlindsay/nginx-independent-subrequest.git

Piotr, this uses some of your fake request code (see
src/fake_request.c). I added your license at the front of that file,
since most of that code is yours.

At the moment, this module has one publicly available function:
ngx_indep_subreq_fetch(). You give it a parsed ngx_url_t and a callback
function and it creates the request, sets up the upstream, kicks off the
proxying, and then calls your callback with the subrequest during the
finalize_request stage.

You can pass in a struct of function pointers that the module will use
at the various upstream callback points. Right now, you’ll definitely
need a function for create_request(), otherwise the request will have no
request buffers to pass along. Having something that sets
r->upstream->request_bufs to “GET / HTTP/1.0\r\n\r\n” is enough to get
it to actually successfully proxy and return data, assuming your fetch
callback sends the upstream->buffer to the main request and finalizes
it.

Pretty rough right now, but it’s a good start.

–Shaun

Posted at Nginx Forum:

shaun · November 22, 2009, 4:11am

I moved all the subrequest specific stuff in to a separate module,
available here:
git://github.com/srlindsay/nginx-independent-subrequest.git

Hmm… This is nice, although it’s on a higher level than I originally
thought it should be. I was thinking more of a src/fake_request.c with
some
sane defaults. But it seems that your idea might be even more
developer-friendly than mine was

Piotr, this uses some of your fake request code (see src/fake_request.c).
I added your license at the front of that file, since most of that code is
yours.

Thanks! You should also add your license to rest of the files. People
aren’t
really allowed to use modules without any license (legally speaking).

Best regards,
Piotr S. < [email protected] >