XSLT2 - new module, reviews and opinions needed :)

Hi,
We have written new Nginx modules and we publish them in hope of getting
opinions. :slight_smile: Generally, these modules are connected with XML and XSLT
processing. Information and short description are available at

I’m especially interested in review of XSLT2 module. It’s very similar
to Igor S.’s XSLT module, but allows to declare XSL style sheets
directly in XML document and download them from external locations.
Nginx events are used to download external files, so our filter have to
work asynchronously. This is our first nginx module, so we had a lot of
problems with that asynchronous work. If somebody is familiar with
developing Nginx modules / filter, I would be grateful for any feedback,
opinions, reviews, errors in code etc. This is early version, we are
still learning . :wink:
I have many doubts about the finalization of request
(ngx_http_xslt2_body_filter, ngx_http_xslt2_send and
ngx_http_xslt2_fast_handler functions). It actually works, but I’m not
sure if this is proper way to do it.

Source (svn browser):

Module:
https://sourceforge.net/projects/xxslt/files/0.1.1/source/xslt2.tgz/download

I will be grateful for any comments. :slight_smile:

Posted at Nginx Forum:

Hello!

On Wed, Oct 14, 2009 at 06:46:07AM -0400, piotreek wrote:

I will be grateful for any comments. :slight_smile:

Some random comments in no particular order:

  1. Posting code with inconsistent style for review may
    dramatically reduce your karma.

  2. Reinventing the wheel isn’t really good idea, but once you
    decided to reimplement http client (and not use upstream module
    for this), you probably should know that gethostbyname() function
    blocks and can’t be used during request processing.

  3. Various *_create_conf handlers should return NULL on errors, not
    NGX_CONF_ERROR.

  4. In 0.8.11 reference counting was introduced for request
    structure instead of previously used “is connection still alive”
    checks. That’s why just returning NGX_DONE no longer works, as
    you have to do r->main->count++ before doing this (if you are
    planning to continue work with request in question). It’s
    probably a good idea to review this change - code suggest you
    didn’t get it yet.

Maxim D.

Many thanks for your comments!

Ad 1. You’re right. We changed the code style and now it should be
consistent and similar to Nginx style. :wink:

Ad 2. This is interesting. We’ve thought about using an upstream, what
is more, we’ve tried to do it. :wink: However, our module allows to download
simultaneously a lot of files, when upstream allows to download only one
per one request. Am I wrong? Is it possible to download simultaneously a
few files using upstream?

When it comes to gethostbyname - ofc you’re right again, we will think
about this. :wink: Any suggestions?

Ad 3. Right, corrected.

Ad 4. This is interesting too. I’ve changed code by adding just
“r->count++;” and removing “r->buffered = 1;” and it seems to work.
Independently of value returned from body_filter function. When it
returns NGX_OK it works, when it returns NGX_DONE after last chain - it
works too… What should be returned?

Furthermore, is it all about r->count? Just increment it and forget?
Only way to learn that mechanism is reading source code of Nginx? Is it
described anywhere? We’ve tried to look for information about
asynchronous work of filters, but we haven’t found anything except
existing modules and their source code. :wink:

New version of code with r->count is available at sourceforge SVN:

  1. One more thing – collecting data from incoming buffer chains. I’ve
    seen in other modules (for example in xslt) strange way to collect all
    buffers and process them after getting the last one. You have to copy
    the content, set in->buf->pos = in->buf->last (that isn’t intuitive ;))
    and return NGX_OK from body_filter for each buffer chain. Is it only way
    to do it?

This has an important disadvantage – we can’t keep pointers to that
buffers, because their content is changed by nginx later. We’ve tried to
set some flags in buffers with no interesting result. :wink:

We would like to collect pointers to buffers, don’t copy their content
and in some cases – return original, unmodified buffer chains after
obtaining the last buffer chain (= after reading whole incoming data).
Can we do it without copying all of them?

Posted at Nginx Forum:

Ok, we’ve written asynchronous resolver and it seems to work properly.

However, we have problems with subrequests… We thought about them at
the very beginning, we tried to understand their mechanism by reading
SSI filter source code, but it was really complicated, so we’ve written
our own module to do that.

For example - when I use subrequest in simplest way:

ngx_http_myfilter_body_filter(…)
if(r == r->main) {
(…)
ngx_http_subrequest(…)
}
ngx_http_next_body_filter(…)
}

result of that subrequest will be send to the client - we don’t want
that.
Ofc I can do something like that:

if(r == r->main)
{
ngx_http_subrequest(…)
return ngx_http_next_body_filter(r, in);
} else {
return ngx_http_next_body_filter(r, NULL);
}

but this isn’t probably best way to do that… This will “kill” any
subrequest, not only ours. In SSI this problem is solved in some way,
but I can’t find how. :wink:

Another problem - reading result of subrequest in registered callback.
In SSI there is a function (registered as callback for some
subrequests):

static ngx_int_t
ngx_http_ssi_set_variable(ngx_http_request_t *r, void *data, ngx_int_t
rc)
{
ngx_str_t *value = data;
if (r->upstream) {
value->len = r->upstream->buffer.last - r->upstream->buffer.pos;
value->data = r->upstream->buffer.pos;
}
return rc;
}

Result is taken from r->upstream, but why? Why upstream module is used?
What will happen if subrequest handles static file?

And last, but not least - parallel subrequests.
Evan M. says that this is really complicated etc.
http://www.evanmiller.org/nginx-modules-guide-advanced.html#subrequests-parallel
SSI source code isn’t clear for me, but what about that code:

for(…) {
if (ngx_http_subrequest(…) != NGX_OK)
return NGX_ERROR;
}

This should enqueue some subrequest and nginx should process them “in
parallel” (almost, as it’s one thread). Am I wrong? What more can I do
to simulate parallelism?

Posted at Nginx Forum:

And one thing more…
I’ve read source of proxy_pass / memcached module to check how they use
upstream. Unfortunately upstream servers are defined in nginx configure
file. It’s useless for us. We need to parse incoming buffers in filter,
find server uri and download some content.
Is there another way to configure upstream? During work of body_filter?

In upstream.h there is only one function which adds upstream server:
ngx_http_upstream_srv_conf_t *ngx_http_upstream_add(ngx_conf_t *cf,
ngx_url_t *u, ngx_uint_t flags);
It requires ngx_conf_t *cf, so it must be used during creating
configuration…

Posted at Nginx Forum:

Hello!

On Thu, Oct 15, 2009 at 09:49:45AM -0400, piotreek wrote:

Many thanks for your comments!

Ad 1. You’re right. We changed the code style and now it should be consistent and similar to Nginx style. :wink:

Ad 2. This is interesting. We’ve thought about using an upstream, what is more, we’ve tried to do it. :wink: However, our module allows to download simultaneously a lot of files, when upstream allows to download only one per one request. Am I wrong? Is it possible to download simultaneously a few files using upstream?

Take a look at subrequest thing. Basically - take a look at SSI
module to see how it handles blocks.

When it comes to gethostbyname - ofc you’re right again, we will think about this. :wink: Any suggestions?

Take a look at the upstream module, it has an example of how
nginx’s async resolver may be used.

Ad 3. Right, corrected.

Ad 4. This is interesting too. I’ve changed code by adding just “r->count++;” and removing “r->buffered = 1;” and it seems to work. Independently of value returned from body_filter function. When it returns NGX_OK it works, when it returns NGX_DONE after last chain - it works too… What should be returned?

Not really sure. Also it’s quite possible that I’m wrong and from
filter it’s better to use r->buffered (though it’s limited to
stock modules as it’s bitmask… but SSI uses it for quite a
similar tasks… it looks for me that this is obsolete aproach
though) or r->blocked (introduced together with r->count,
AIO uses it).

Probably it’s a good idea to ask Igor…

Furthermore, is it all about r->count? Just increment it and forget? Only way to learn that mechanism is reading source code of Nginx? Is it described anywhere? We’ve tried to look for information about asynchronous work of filters, but we haven’t found anything except existing modules and their source code. :wink:

Basically - you increment it, and call to
ngx_http_finalize_request() decrements it.

And yes, documentation is written in C language. :slight_smile:

New version of code with r->count is available at sourceforge SVN: XXSLT - Nginx module for XSLT processing download | SourceForge.net

  1. One more thing – collecting data from incoming buffer chains. I’ve seen in other modules (for example in xslt) strange way to collect all buffers and process them after getting the last one. You have to copy the content, set in->buf->pos = in->buf->last (that isn’t intuitive ;)) and return NGX_OK from body_filter for each buffer chain. Is it only way to do it?

This has an important disadvantage – we can’t keep pointers to that buffers, because their content is changed by nginx later. We’ve tried to set some flags in buffers with no interesting result. :wink:

We would like to collect pointers to buffers, don’t copy their content and in some cases – return original, unmodified buffer chains after obtaining the last buffer chain (= after reading whole incoming data). Can we do it without copying all of them?

By setting buf->pos = buf->last you signal upstream modules that
data in the buffer was already processed and it may be reused. If
you don’t do this - upstream modules will eventually run out of
free buffers and stuck.

Basically, you have to process buffers as soon as they arrive, and
if you need some data later - copy it.

Maxim D.

Hello!

On Wed, Oct 21, 2009 at 09:39:54AM -0400, piotreek wrote:

And one thing more…
I’ve read source of proxy_pass / memcached module to check how they use upstream. Unfortunately upstream servers are defined in nginx configure file. It’s useless for us. We need to parse incoming buffers in filter, find server uri and download some content.
Is there another way to configure upstream? During work of body_filter?

In upstream.h there is only one function which adds upstream server:
ngx_http_upstream_srv_conf_t *ngx_http_upstream_add(ngx_conf_t *cf, ngx_url_t *u, ngx_uint_t flags);
It requires ngx_conf_t *cf, so it must be used during creating configuration…

Looks like you completely missed variables support in proxy_pass.
Try re-reading sources, in some cases it helps. :slight_smile:

Maxim D.

Hello!

On Wed, Oct 21, 2009 at 06:30:17AM -0400, piotreek wrote:

}
} else {
return ngx_http_next_body_filter(r, NULL);
}

but this isn’t probably best way to do that… This will “kill” any subrequest, not only ours. In SSI this problem is solved in some way, but I can’t find how. :wink:

In ngx_http_ssi_include():

% if (set) {
%
% …
%
% flags |=
NGX_HTTP_SUBREQUEST_IN_MEMORY|NGX_HTTP_SUBREQUEST_WAITED;
% }

}
return rc;

}

Result is taken from r->upstream, but why? Why upstream module is used? What will happen if subrequest handles static file?

It works only with replies from memcached and/or proxy modules.
This limitation is documented here:

http://sysoev.ru/nginx/docs/http/ngx_http_ssi_module.html#commands

(looks like relevant page on wiki isn’t really relevant, google
translate is the rescue).

This should enqueue some subrequest and nginx should process them “in parallel” (almost, as it’s one thread). Am I wrong? What more can I do to simulate parallelism?
No, just calling ngx_http_subrequest() in order is enough.
Everything else will be handled by nginx itself.

Maxim D.