Custom event + timer regressions caused by the new release (Was Re: nginx-0.8.21)

2009/10/26 Igor S. [email protected]:

Changes with nginx 0.8.21 26 Oct 2009

Sigh. All the regression tests for my “echo” module’s echo_sleep in
subrequests are hanging with this nginx version.

The tests were passing happily with 0.7.46 ~ 0.7.63 and 0.8.0 ~ 0.8.20.

I’m not sure if it’s a regression or an internal event model change in
the Nginx internals.

I’d like to seek any help from the developers here in the fear of
reading the complicated source diff’s and huge hours of debugging.

Basically I’m implementing non-blocking “sleep” this way:

  1. Put a custom ngx_event_t member named “sleep” in my module’s
    per-request context struct :

    typedef struct {

    ngx_event_t sleep;
    } ngx_http_echo_module_ctx_t;

(I’m not reusing the per-connection r->read or r->write events here
because I’d like my “sleep” work with subrequests.)

  1. Then initialize the event object this way:

    ctx->sleep.handler = ngx_http_echo_sleep_event_handler;
    ctx->sleep.data = r;
    ctx->sleep.log = r->connection->log;

The event handler ngx_http_echo_sleep_event_handler is coded this way
(mostly a clone of nginx’s own function ngx_http_request_handler:

void
ngx_http_echo_sleep_event_handler(ngx_event_t *ev) {
    ngx_connection_t        *c;
    ngx_http_request_t      *r;
    ngx_http_log_ctx_t      *ctx;

    r = ev->data;
    c = r->connection;
    ctx = c->log->data;
    ctx->current_request = r;

    ngx_http_echo_post_sleep(r);

    ngx_http_run_posted_requests(c);  # only enabled for >= 0.7.46
}

The final bit is the ngx_http_echo_post_sleep function:

static void
ngx_http_echo_post_sleep(ngx_http_request_t *r) {
    ngx_http_echo_ctx_t         *ctx;
    ngx_int_t                   rc;

    ctx = ngx_http_get_module_ctx(r, ngx_http_echo_module);
    if ( ! ctx->sleep.timedout ) {
        return;
    }

    ctx->sleep.timedout = 0;

    if (ctx->sleep.timer_set) {
        ngx_del_timer(&ctx->sleep);
    }

    rc = ngx_http_echo_handler(r);  # my main content handler

    if (rc == NGX_OK) {  # only enabled for nginx_version >= 8011
        r->main->count--;   # not completely sure about this though
    }

    ngx_http_finalize_request(r, rc);
}
  1. In the sleep content handler, add a timer based on the “sleep”
    event to start sleeping:

    r->main->count++; # only enabled for nginx_version >= 8011

    ngx_add_timer(&ctx->sleep, (ngx_msec_t) (1000 * delay));

    return NGX_DONE;

In 0.8.21, my “sleep” works in main requests but will hang in
subrequests. What is the key magic that I’m missing here? And a good
explanation of the r->main->count reference counter introduced in
recent 0.8.x releases will be highly appreciated as well :slight_smile:

Thanks in advance!

-agentzh

On Tue, Oct 27, 2009 at 4:27 PM, agentzh [email protected] wrote:

Sigh. All the regression tests for my “echo” module’s echo_sleep in
subrequests are hanging with this nginx version.

Sorry, the failures should be rephrased a bit:

“All the regression tests for my “echo” module’s echo_sleep directive
that is used in subrequests chained by “post_subrequest” callback
accepted by ngx_http_subrequest.”

Basically, it’s a subrequest calling chain:

/main content handler
→ calls ngx_http_subrequest /sub1
→ returns
/sub1 content handler
→ sleeps
→ returns
/sub1 post_subrequest called in /sub’s own ngx_http_finalize_request
call
→ calls back /main content handler
/main content handler resumed
– calls ngx_http_subrequest /sub2
– returns
/sub2 content handler
– sleeps again
– returns
/sub2 post_subrequest called in /sub2’s own ngx_http_finalize_request
call
→ calls back /main content handler
→ returns
/main content handler resumed
– returns

It can be viewed as a calling chain using the Continuation Passing
Style where the “post_subrequest” argument fed into
ngx_http_subrequest is the “continuation” object being passed along.

In “echo” module’s terms, the corresponding configuration looks like
this:

location /main {
    echo_reset_timer;
    echo_subrequest GET /sub1;
    echo_subrequest GET /sub2;
    echo "took $echo_timer_elapsed sec for total.";
}
location /sub1 {
    echo_sleep 0.02;
    echo hello;
}
location /sub2 {
    echo_sleep 0.01;
    echo world;
}

Calling GET /main will hang in nginx 0.8.21 but not for those previous
versions.

In my error.log, I can see the following additional messages when
using 0.8.21 after subrequest /sub2 completes and /main starts to
resume (for the last time):

2009/10/27 17:36:59 [debug] 16885#0: posted events 00000000
2009/10/27 17:36:59 [debug] 16885#0: worker cycle
2009/10/27 17:36:59 [debug] 16885#0: epoll timer: 59969
2009/10/27 17:37:01 [debug] 16885#0: epoll: fd:3 ev:0001 d:088C90C0
2009/10/27 17:37:01 [debug] 16885#0: *1 http run request: “/main?”
2009/10/27 17:37:01 [debug] 16885#0: *1 http test reading
2009/10/27 17:37:01 [info] 16885#0: *1 client closed prematurely
connection, client: 127.0.0.1, server: localhost, request: “GET /main
HTTP/1.1”, host: “localhost:1984”
2009/10/27 17:37:01 [debug] 16885#0: *1 http finalize request: 0,
“/main?” a:1, c:1
2009/10/27 17:37:01 [debug] 16885#0: *1 http terminate request count:1

Sorry for exposing so many details. The manifest is more complicated
than I originally believed. “sleep” works in ordinary “parallel
subrequests” by the way :slight_smile:

Thanks!
-agentzh