CPS-chained subrequests with I/O interceptions no longer work in nginx 0.8.21 (Was Re: Custom event

On Tue, Oct 27, 2009 at 5:41 PM, agentzh [email protected] wrote:

Sorry, the failures should be rephrased a bit:

“All the regression tests for my “echo” module’s echo_sleep directive
that is used in subrequests chained by “post_subrequest” callback
accepted by ngx_http_subrequest.”

Okay, more investigation shows that the real problem has nothing to do
with “sleep”, custom events, nor timers. It’s the chained subrequest
model with “I/O interceptions” that no longer works in nginx 0.8.21.

The problem is much easier to phrase in terms of the “echo” module’s
directives:

location /main {
    echo_location /foo;
    echo_location /sub1;
}
location /sub1 {
    proxy_pass 'http://127.0.0.1:$server_port/foo';
}
location /foo {
    echo $echo_request_uri;
}

This is the minimal test case that I can produce. Basically, the
“post_subrequest” handler for the /foo subrequest issued directly by
/main issues a subrequest to /sub1. Then /sub1 in turn calls the
standard proxy module’s handler to do some I/O operation. Then the
whole connection never terminates properly and hangs forever.

If no I/O operation is involved, then the whole request completes
normally in 0.8.21, no matter it’s “echo_sleep” or “proxy_pass” that
do the actual I/O interception.

The “echo_location” directive and its async friend are defined in the
following .c file (for total 130+ lines of code only):

http://github.com/agentzh/echo-nginx-module/blob/master/src/location.c

Could anyone give me a handle? :slight_smile:

Thanks!
-agentzh

It seems that you are not completely aware how subrequests work.

The Evan M.'s guide to nginx programming might help you:

http://www.evanmiller.org/nginx-modules-guide-advanced.html#subrequests-sequential

Hello!

On Tue, Oct 27, 2009 at 12:43:10PM +0000, Valery K. wrote:

It seems that you are not completely aware how subrequests work.

The Evan M.'s guide to nginx programming might help you:

Emiller’s Advanced Topics In Nginx Module Development – Evan Miller

This section looks, uhm, a bit obsolete for me. Things changed in
0.7.25 and it’s no longer required to check anything but NGX_ERROR
returned from ngx_http_subrequest().

Maxim D.

On Tue, Oct 27, 2009 at 8:43 PM, Valery K.
[email protected] wrote:

The Evan M.'s guide to nginx programming might help you:

Emiller’s Advanced Topics In Nginx Module Development – Evan Miller

Unfortunately it won’t :slight_smile:

I read that guide very carefully one month ago. It does not cover
subrequests issued from content handler and the use of the
“post_subrequest” parameter of ngx_http_subrequest function.

Subrequests issued from output filters are useful but not so useful
if without the capability to be initiated from elsewhere. I would be
very sad if nginx’s subrequest model is that limited. Luckily I’ve
had an enjoyable time in parallel subrequests directly issued from the
content handler. And the more powerful (and in many situations more
useful) chained subrequests issued from content handler intermediated
by “post_subrequest” at least seems to work before 0.8.21. And
that’s what I’m asking here. Is it something just too crazy and too
insane to do in nginx?

Also, as the guide itself has noted, the subrequest model has changed
a lot since then. As far as what has affected me, the model has
changed in the following releases (even without a note it their
ChangeLogs it seems):

0.7.21, 0.7.46, 0.8.4 and possibly more

The list can never be complete I suppose :slight_smile:

Testing every release tarball using my own test suite to find when
things dramatically change and coding up #if/#else directives are
tedious, but maybe it’s worth it :slight_smile:

Thank you for your suggestion anyway :slight_smile:

Cheers,
-agentzh

Hello!

On Tue, Oct 27, 2009 at 06:42:06PM +0800, agentzh wrote:

with “sleep”, custom events, nor timers. It’s the chained subrequest
}
If no I/O operation is involved, then the whole request completes
normally in 0.8.21, no matter it’s “echo_sleep” or “proxy_pass” that
do the actual I/O interception.

The “echo_location” directive and its async friend are defined in the
following .c file (for total 130+ lines of code only):

http://github.com/agentzh/echo-nginx-module/blob/master/src/location.c

Could anyone give me a handle? :slight_smile:

Could you please try to produce reduced code that triggers the issue?

I don’t really have much time to investigate your module code,
but I believe the problem is that you use subrequests and recall
content handlers in mostly arbitrary way. E.g. it looks like you
call content handler again from post_subrequest handler. This
isn’t supposed to work - they are normally called in quite a
different contexts and have very different expectations. Probably
some changes in 0.8.21 rendered your code to be completely
invalid.

Maxim D.

On Tue, Oct 27, 2009 at 9:41 PM, Maxim D. [email protected]
wrote:

Could you please try to produce reduced code that triggers the issue?

Essentially, say, I have two subrequests A and B. And then:

  1. I start subrequest A in my content handler this way:

    ngx_http_post_subrequest_t *psr;

    psr->handler = post_subrequest_handler;
    psr->data = ctx;

    rc = ngx_http_subrequest(r, &location, &query_string, &sr, psr, 0);
    if (rc != NGX_OK) {
    return NGX_ERROR;
    }

    return NGX_OK;

  2. In A’s post_subrequest’s handler, i.e., the
    “post_subrequest_handler” function, issues subrequest B to a location,
    say, /foo, that involves I/O operations, like this:

    location /foo {
    proxy_pass “http://bar.com/baz”;
    }

Then after subrequests A and B are done, the whole connection hangs
forever and will never terminate (at least it seems so).

I don’t really have much time to investigate your module code,
but I believe the problem is that you use subrequests and recall
content handlers in mostly arbitrary way.
E.g. it looks like you
call content handler again from post_subrequest handler. This
isn’t supposed to work - they are normally called in quite a
different contexts and have very different expectations.

I just reused my content handler’s C function to avoid code
duplication in my “echo” module :slight_smile: It could be a separate handler
function given above. The key issue is that whether the content
handler function and the post_subrequest handler has the same
semantics in their ngx_int_t retval, given that they happen to have
exactly the same prototype.

Probably
some changes in 0.8.21 rendered your code to be completely
invalid.

I’ve just tracked down, yes, eventually, that it is the following line
of code newly added to 0.8.21 causes all my issues:

— nginx-0.8.20/src/http/ngx_http_request.c 2009-10-02
19:30:47.000000000 +0800
+++ nginx-0.8.21/src/http/ngx_http_request.c 2009-10-22
17:48:42.000000000 +0800
@@ -2235,6 +2248,8 @@
ngx_log_debug2(NGX_LOG_DEBUG_HTTP, wev->log, 0,
“http writer done: "%V?%V"”, &r->uri, &r->args);

  •  r->write_event_handler = ngx_http_request_empty_handler;
    
  • ngx_http_finalize_request(r, rc);
    
    }

By manually commenting out the write_event_handler assignment line,
ALL my tests are passing with nginx 0.8.21! :smiley:

I have no idea regarding the reason behind that line’s addition given
my limited knowledge of the nginx internals. Is there a way to
work-around it in nginx 0.8.21?

I’m guessing that nobody has done such things before, but I do believe
this is really useful for doing component-style programming in nginx,
because we don’t have to rely on output filters (that are both costly
and very verbose to write) to hack in sequential subrequests.

Hopefully the nginx developers, especially Igor, will keep this trick
working in future versions or even bless it as a standard approach
rather than an evil hack or something.

Thanks!
-agentzh

On Wed, Oct 28, 2009 at 12:26 PM, agentzh [email protected] wrote:

I’m guessing that nobody has done such things before, but I do believe
this is really useful for doing component-style programming in nginx,
because we don’t have to rely on output filters (that are both costly
and very verbose to write) to hack in sequential subrequests.

Okay, folks, I’ve just created a git branch for my “echo” module and
quickly hacked in the “classical” model of sequential subrequests by
means of output filters.

But again, 0.8.21 hangs the tests and passing every test if commenting
out that line. 0.8.20 is still passing as before.

The “classical” model looks like this:

  1. The content handler issues a subrequest A, and registers a specific
    context object to the subrequest A. This context object could be
    recognized by a special output filter, say, F. And the current request
    object (ngx_http_request_t instance) associated with the main content
    handler is put into this context object.

  2. The subrequest A generates outputs, chain by chain. And the F
    filter runs over them, chain by chain as well. The F filter detects if
    there’s a context object attached to the current request object, if
    not, just calls the next body filter; if yes, walks through the
    current chain link to find out the last buf indicating the end of the
    currrent subrequest’s response body stream.

  3. If the last buf is found, the filter F takes the original request
    object that is associated with the original content handler from its
    context object, and initiates subrequests or do other things (actually
    this part is implemented as resuming the continuation of the content
    handler in the “echo” module).

The tricky part is to search the “last buf” in the filter F which runs
over the response body of a subrequest. It seems that buf->last_buf is
already cleared by something else (more investigation needed) and
buf->sync is set instead. So currently I’m cheating a bit and just
checking the buf->sync flag and thinking it’s the right way to do in
at least late 0.8.x’s. Well, it does not work in 0.7.x anyway.
Thoughts?

You can find the relevant source code here:

http://github.com/agentzh/echo-nginx-module/blob/master/src/location.c

and here

http://github.com/agentzh/echo-nginx-module/blob/cps-filter/src/cps-filter.c

where “cps_filter” is the codename for the filter F in the previous
explanation.

Even though the “last buf” searching in subrequest output filter is
still a hack, this experiment has convinced me that it looks more like
a regression in nginx itself rather than misuse of the
“post_subrequest” thingy or something else. In my cps-filter branch,
I’m not using “post_subrequest” anyway.

What do you think? :wink:

Cheers,
-agentzh

Cheers,
-agentzh

On Wed, Oct 28, 2009 at 12:26 PM, agentzh [email protected] wrote:

}

Oops, forget to mention that this subrequest B is issued using the
subrequest A’s parent request object. The “post_subrequest_handler”
function’s implementation looks like this:

static ngx_int_t
post_subrequest_handler(ngx_http_request_t *r) {
    rc = ngx_http_subrequest(r->parent, &location, url_args, &sr, 

psr, 0);
if (rc != NGX_OK) {
return NGX_ERROR;
}
return NGX_OK;
}

And yeah, I know this piece of code looks very tricky and I was so
delighted to find out that it actually worked in those previous
versions :slight_smile:

Cheers,
-agentzh

On Wed, Oct 28, 2009 at 5:10 PM, agentzh [email protected] wrote:

You can find the relevant source code here:

http://github.com/agentzh/echo-nginx-module/blob/master/src/location.c

Sorry, it should be the location.c in the cps-filter branch:

http://github.com/agentzh/echo-nginx-module/blob/cps-filter/src/location.c

Cheers,
-agnetzh

Hello!

On Wed, Oct 28, 2009 at 05:10:10PM +0800, agentzh wrote:

this part is implemented as resuming the continuation of the content
handler in the “echo” module).

The tricky part is to search the “last buf” in the filter F which runs
over the response body of a subrequest. It seems that buf->last_buf is
already cleared by something else (more investigation needed) and
buf->sync is set instead. So currently I’m cheating a bit and just
checking the buf->sync flag and thinking it’s the right way to do in
at least late 0.8.x’s. Well, it does not work in 0.7.x anyway.
Thoughts?

Currently last_buf is only set for really last buffer. I.e.
subrequests shouldn’t set it, if they do - it’s a bug and likely
to cause problems.

This isn’t really natural (and I talked to Igor about this
recently, probably this should be changed to simplify things), but
that’s how it works now.

You can find the relevant source code here:

http://github.com/agentzh/echo-nginx-module/blob/master/src/location.c

and here

http://github.com/agentzh/echo-nginx-module/blob/cps-filter/src/cps-filter.c

where “cps_filter” is the codename for the filter F in the previous explanation.

One obvious error I see is that your code tries to do something
once it passed buffer with last_buf set downstream. This isn’t
going to work.

Even though the “last buf” searching in subrequest output filter is
still a hack, this experiment has convinced me that it looks more like
a regression in nginx itself rather than misuse of the
“post_subrequest” thingy or something else. In my cps-filter branch,
I’m not using “post_subrequest” anyway.

What do you think? :wink:

You was asked to produce reduced code for reason. Please do so.
Your module is huge enough to be hard to analyse, and there are
too many ways to screw things up.

Alternatively you may try to reproduce problem with SSI and/or
addition filter.

Maxim D.

On Wed, Oct 28, 2009 at 6:10 PM, Maxim D. [email protected]
wrote:

This isn’t really natural (and I talked to Igor about this
recently, probably this should be changed to simplify things), but
that’s how it works now.

So how to detect the end of a subrequest’s output stream in a
subrequest’s output filter? Like in the context of the “addition”
module? (The “addition” module explicitly disallowed any uses in a
subrequest.)

One obvious error I see is that your code tries to do something
once it passed buffer with last_buf set downstream. This isn’t
going to work.

Oh, not really.

It passed buffer with last_buf set downstream, but it does bot do
anything with the current request object. Rather, it tries to do
something with its parent request object if its parent request has
not sent “last buf” yet. Well, I know it is not that obvious here :wink:

In other words, it reopens the continuation and resumes the
execution of its parent request’s content handler’s tasks. And that’s
exactly why I use the term “continuation” and “CPS” all over this
thread. It’s a CPS chain that backtracks, rather than a strictly
linear subrequest chain that never goes back. (The latter will be
illustrated using a patched “addition” module below.)

You was asked to produce reduced code for reason. Please do so.
Your module is huge enough to be hard to analyse, and there are
too many ways to screw things up.

I’ll try to produce a standalone module to demonstrate this issue.
It’s nontrivial and will take some time :slight_smile:

Alternatively you may try to reproduce problem with SSI and/or
addition filter.

The “addition” module does not really work in subrequests yet. In
fact, it explicitly checks r != r->main in its header filter. I’ve
patched it to make it work recursively in subrequests and could not
reproduce the hang:

addition_types text/plain;
location /main {
    echo main;
    add_after_body /sub;
}
location /sub {
    echo sub;
    add_after_body /sub2;
}
location /sub2 {
    proxy_pass 'http://127.0.0.1:$server_port/foo';
}
location /foo {
    echo foo;
}

I’m getting the expected response of /main without hanging:

main
sub
foo

So I’m guessing it’s the subsubrequests I issued on the parent request
of the current subrequest that makes a difference.

Could anyone confirm the justification of following modification of
nginx 0.8.21?

— nginx-0.8.20/src/http/ngx_http_request.c 2009-10-02
19:30:47.000000000 +0800
+++ nginx-0.8.21/src/http/ngx_http_request.c 2009-10-22
17:48:42.000000000 +0800
@@ -2235,6 +2248,8 @@
ngx_log_debug2(NGX_LOG_DEBUG_HTTP, wev->log, 0,
“http writer done: "%V?%V"”, &r->uri,
&r->args);

  •       r->write_event_handler = ngx_http_request_empty_handler;
    
  •      ngx_http_finalize_request(r, rc);
     }
    

It causes all these troubles on my side.

Thanks!
-agentzh

On Tue, Oct 27, 2009 at 6:42 PM, agentzh [email protected] wrote:

Okay, more investigation shows that the real problem has nothing to do
with “sleep”, custom events, nor timers. It’s the chained subrequest
model with “I/O interceptions” that no longer works in nginx 0.8.21.

Finally, people, finally…I got this issue perfectly sorted out this
early morning :slight_smile:

I suddenly realized last night that the “last” subrequest’s
“post_subrequest” handle should call ngx_http_finalize_request on its
parent handler when the parent handler’s content handler returns
anything other than NGX_DONE.

After applying this simple fix [1], the “echo” module passes its whole
test suite against 0.8.21 ~ 0.8.27 as well as all those older versions

= 0.7.46. Hooray! :smiley:

I’ve just released “echo” v0.19 which includes this fix:

http://github.com/agentzh/echo-nginx-module/downloads

In retrospect, it was indeed my fault, rather than nginx’s. The newer
versions (0.8.21+) just expose my mistakes in chained subrequest
handling :stuck_out_tongue:

Thanks shaun for bringing this issue to me in the other thread [2].

Special thanks go to Marcus C. who also cares about this problem
and my “echo” module off-list.

Thanks!
-agentzh

P.S. Lessons learned: when request hangs, first think about potential
lack of ngx_http_finalize_request call (and possible incorrect
r->main->count numbers) :slight_smile:

References
[1] You can see the patch here:

[2] Parallel subrequests for multi-source long polling?