Making http_auth_request_module a first-class citizen?

dubstep · February 15, 2012, 2:57pm

Hello Maxim,

Back in 2010 you wrote that it’s not likely that your
http_auth_request_module would make it into nginx core. I’m curious if
anything has changed over the past two years?

It’s not that compiling this module into nginx is a problem
(especially on FreeBSD), but I think a lot of people are inherently
weary of depending on 3rd-party modules, since there is no guarantee
of continued support.

What do you think about adding your module to the main nginx repository?

Max

xamdam · February 15, 2012, 3:50pm

Hello!

On Wed, Feb 15, 2012 at 08:56:49AM -0500, Maxim K. wrote:

What do you think about adding your module to the main nginx repository?

There are no immediate plans, but this may happen somewhere in the
future.

Maxim D.

xamdam · February 16, 2012, 1:38pm

Max,
please keep the discussion in single thread.

Any comments will be appreciated.

You’re delusional, don’t try to fix things that you don’t understand.

Your whole reasoning is based on a fact that you think that
authorization
subrequest fetches the protected file/page which client wants to access,
but
that’s not the case. Authorization subrequest should access special
authorization endpoint (or database or anything else you can think of)
and
either grant access and let the main request access the protected
file/page
or not.

Best regards,
Piotr S. < [email protected] >

xamdam · February 16, 2012, 5:16am

15 февраля 2012, 18:50 от Maxim D. [email protected]:

It’s not that compiling this module into nginx is a problem
(especially on FreeBSD), but I think a lot of people are inherently
weary of depending on 3rd-party modules, since there is no guarantee
of continued support.

What do you think about adding your module to the main nginx repository?

There are no immediate plans, but this may happen somewhere in the
future.

Hello fellow Maxims and others,

I took a closer look at the auth_request module source code today and
realized that I was partially wrong about auth_request authorization
subrequests causing the entire requested file to be retrieved from the
backend server. I apologize for the confusion my posts may have
caused. Due to sr->header_only being set to 1, the connection to the
backend server is terminated from within
ngx_http_upstream_send_response()
as soon as the HTTP request status code is received.

So the entire file is (usually) not retrieved, but due to the fact that
the connection is prematurely closed, it may take a while for the
backend
server to actually detect that the frontend server has closed the
connection and that it should stop sending.

I have done some testing to see what kind of overhead this kind of
abrupt connection termination causes, and according to my test
results, auth_request module authorization requests always generate
at least 65 kb worth of traffic (including the initial TCP three-way
handshake and the final four-way FIN/ACK connection termination
packets). Under heavy load, the amount of traffic generated mainly
by the backend server while sending the requested file in vain
exceeded 500 kb, so it’s obvious that this might lead to
significant overhead when large files are sent in response
to auth_request authorization subrequests.

Moreover, each file on the backend server that is accessed this
way has to be opened and prepared for sending, which causes disk
overhead and buffer allocation because the backend server
treats those requests just like all the other GET method
requests. Under heavy load, this overhead can be significant.
Another minor problem is that logs on the backend server
tend to get filled up with constant error messages about
connections being closed prematurely, writev() failures etc.

All of these issues can be avoided simply by using HEAD method
requests for authorization subrequests. According to my
test results, HEAD method authorization subrequests generate
no more than 1310 bytes worth of traffic (including the initial
TCP three-way handshake and the final four-way FIN/ACK connection
termination packets). GET method authorization subrequests, on
the other hand, generate at best 50 times and at worst 400 times
more traffic than HEAD method requests, so files smaller than 64 kb
actually do get retrieved twice from the backend server on each
request, as well as files smaller than 200 kb under heavy load.

HEAD method request responses, on the other hand, remain the same
size even under heavy load, and the upstream / backend server always
immediately terminates the connection on its side after sending its
(usually) single-packet response, so there is also no additional disk
overhead, only the size of the requested file is checked, and no
additional
buffer allocation is done, while sr->header_only on the frontend server
can be safely set to 0, which also makes it possible for the
auth_request
module to be extended to make the use of the proxy_cache and proxy_store
directives possible even for authorization requests because completed
upstream requests would no longer cause response-buffering temporary
files to be closed.

Since HEAD method requests also go through the same access phase
checking as the GET method requests, they are also a valid means
of checking whether an actual GET method request would be
allowed, unless different responses are configured for each
method.

The best thing about it all is that you can make the auth_request
module use HEAD method authorization requests by adding the
following directive to the auth_request authorization subrequest
location block:

proxy_method HEAD;

I have also modified the auth_request module to use HEAD method
authorization subrequests by default. This setting can be
overridden in the configuration file by using the proxy_method
directive, of course.

You can find my auth_request module patch here:

https://nginxyzpro.berlios.de/patch-head.ngx_http_auth_request_module.c.20120215.diff

SIZE (patch-head.ngx_http_auth_request_module.c.20120215.diff) = 5196
bytes

SHA256 (patch-head.ngx_http_auth_request_module.c.20120215.diff) =
6d163ec9e11a06bcadd8395042e0a6ef1dc2dfbe0bbfab4cb9d0c4e73e373f75

Any comments will be appreciated.

Max

xamdam · February 16, 2012, 5:07pm

Hello!

On Thu, Feb 16, 2012 at 08:16:03AM +0400, Max wrote:

anything has changed over the past two years?

Hello fellow Maxims and others,

I took a closer look at the auth_request module source code today and
realized that I was partially wrong about auth_request authorization
subrequests causing the entire requested file to be retrieved from the
backend server. I apologize for the confusion my posts may have
caused. Due to sr->header_only being set to 1, the connection to the
backend server is terminated from within ngx_http_upstream_send_response()
as soon as the HTTP request status code is received.

Yes. This is basically a workaround for cases when people
unintentionally return data to auth subrequest, it makes sure that
no unexpected data are sent to client in any case.

[…]

All of these issues can be avoided simply by using HEAD method
requests for authorization subrequests. According to my

Using HEAD is not an option in auth_request itself, as it doesn’t
know how auth subrequest will be handled. E.g. it may be passed to
fastcgi, or even hit static file.

If you handle auth subrequests with proxy_pass, you may use
proxy_set_method to issue HEAD requests to backend. Or you may
use correct auth endpoint which doesn’t return unneeded data.

[…]

I have also modified the auth_request module to use HEAD method
authorization subrequests by default. This setting can be
overridden in the configuration file by using the proxy_method
directive, of course.

You can find my auth_request module patch here:

https://nginxyzpro.berlios.de/patch-head.ngx_http_auth_request_module.c.20120215.diff

The patch is wrong by design, see above. Moreover, it makes it
impossible to correctly pass original request method to auth
endpoint.

Maxim D.

xamdam · February 17, 2012, 9:28am

Hello!

16 февраля 2012, 20:07 от Maxim D. [email protected]:

Hello Maxim,
What do you think about adding your module to the main nginx repository?
caused. Due to sr->header_only being set to 1, the connection to the
requests for authorization subrequests. According to my

The patch is wrong by design, see above. Moreover, it makes it
impossible to correctly pass original request method to auth
endpoint.

Maxim, you haven’t even taken a look at my patch, have you? Because
if you had, you wouldn’t have made such unsubstantiated claims.

First of all, I am referring to subrequests in the context of
subquests created by the ngx_http_subrequest() function.
As you might recall, the auth_request function
ngx_http_auth_request_handler() calls the ngx_http_subrequest()
function to create a subrequest, which inherits most of the values
from the original request, BUT the method values are not
inherited by the subrequest - they are explicitly set to the
GET method:

nginx-1.1.15/src/http/ngx_http_core_module.c:

2361 ngx_http_subrequest(ngx_http_request_t *r,
2362 ngx_str_t *uri, ngx_str_t *args, ngx_http_request_t **psr,
2363 ngx_http_post_subrequest_t *ps, ngx_uint_t flags)
2364 {
…
2417 sr->method = NGX_HTTP_GET;
2434 sr->method_name = ngx_http_core_get_method;
…
2487 }

So your auth_request module NEVER DID pass original request
methods on to subrequests in the first place! By default all
auth_request subrequests using the proxy_pass directive have been,
still are, and will be (until my patch is applied) GET method requests.

My patch changes the default method from GET to HEAD by changing
sr->method, sr->method_name and sr->request_line accordingly.
It does NOT in any way interfere with anything else, you can
use whatever authentication mechanism and endpoint you
want - static files, fastcgi_pass, postgres_pass, etc.

Moreover, you are wrong about my patch making it impossible
to correctly pass the original request method to authentication
endpoints. The original request is fully preserved (along with
the entire original request) and can be accessed through the
$request_method variable inside the subrequest location block.
Feel free to verify this for yourself, if you don’t believe me:

location /private/ {
auth_request /auth;
set $request_method_private $request_method;
}

location = /auth {
set $request_method_auth $request_method;
}

Here’s what you’ll see:

Location Original module (v0.2) Patched module (v20120215)

       GET /private/ HTTP/1.0     GET /private/ HTTP/1.0

/private/ $request_method: GET $request_method: GET
/auth $request_method: GET $request_method: GET (intact)
subrequest method: GET subrequest method: HEAD

       HEAD /private/ HTTP/1.0    HEAD /private/ HTTP/1.0

/private/ $request_method: HEAD $request_method: HEAD
/auth $request_method: HEAD $request_method: HEAD (intact)
subrequest method: GET subrequest method: HEAD

My patch simply makes the proxy_pass directive use the HEAD request
method by default in auth_request subrequest location blocks.

The same can be achieved by using the proxy_method directive
(“proxy_method HEAD;”), but I believe the HEAD method should be used
by default. Why? Because, IMNSHO, it does everything the GET method does
(including Basic access authentication WWW-Authenticate / Authorization
header exchange), but in a much more elegant and efficient way, which
also makes your sr->header_only=1 workaround unnecessary.

I left your workaround the way it is to prevent people from shooting
themselves in the foot by setting the proxy method to GET, but those
who know about the proxy_method directive (BTW, you got the name wrong,
there is no proxy_set_method directive), should know what they are
doing.

BTW, the proxy_method directive is missing from the official
documentation, so you may want to fix that:

http://nginx.org/en/docs/http/ngx_http_proxy_module.html

Most people might be using other authentication methods that this
patch in no way affects, but those who do use the pass_proxy directive
for subrequests would benefit from my patch without losing any of the
functionality. Moreover, if they really need to use the GET method,
or any other method, they still can.

I did some more testing and it turns out that even under moderate load
the backend server keeps sending another 150-200 kb before detecting
the frontend server had closed the connection on its end. Compare
that to 1200-1400 BYTES using the HEAD method even under heavy
load. With large files and per-file access rules on the backend server
that means ten GET method subrequests per second are wasting at least
10 Mbps worth of bandwidth when 100 kbps would have done the job in a
way that would also help reduce the load on the backend server.

So my question to you is: why would you not want to optimize your
module?
I thought nginx was supposed to be about efficiency.

All my claims made here are based on facts and can be verified by
anyone who has the time, the willingness, and the capacity to
understand and apply what I’ve written. I know you understand
what I’ve written, but if you’re too busy or can’t be bothered to deal
with this, just say so, but please don’t jump to conclusions or make
unsubstantiated claims based on what you think I MIGHT have
meant instead of checking the facts because such claims will
only discredit you and alienate potential developers.

Max

xamdam · February 19, 2012, 2:13am

17 февраля 2012, 15:19 от Maxim D. <[email protected]>:
> On Fri, Feb 17, 2012 at 12:28:14PM +0400, Max wrote:
> > Maxim, you haven't even taken a look at my patch, have
you? Because
> > if you had, you wouldn't have made such unsubstantiated
claims.
>
> I have, despite the fact that it was provided as a link only.

I usually send in patches on postcards, but I had run out of postcards.
I'm glad this Internet thing seems to work, although your replies
seem
to indicate your download request had header_only set to 1.

> I stand corrected. Your patch broke only $request variable, not
> the $request_method (which always comes from the main request).

You're wrong, my patch did not break anything. Go ahead and put
any variables in those /private/ and /auth location blocks and
see for yourself. I know it must be difficult to decipher, but
could you try to guess what the following part of my patch does?

64 + /*
65 + * 1) Allocate a new request line string
66 + * (4 extra bytes for future compatibility just in case
67 + * a single letter HTTP request method is introduced).
68 + */
69 +
70 + request_line.data = ngx_pcalloc(r->pool,
sr->request_line.len + 4);
71 + if (request_line.data == NULL) {
72 + return NGX_ERROR;
73 + }

Feel free to verify this for yourself AGAIN:

location /private/ {
auth_request /auth;
set $request_private $request;
set $request_uri_private $request_uri;
set $request_method_private $request_method;
}

location = /auth {
set $request_auth $request;
set $request_uri_auth $request_uri;
set $request_method_auth $request_method;
}

Here's the debug log output:

Inside the /private/ location block:

http script var: "GET /private/test HTTP/1.1"
http script set $request_private

http script var: "/private/test"
http script set $request_uri_private

http script var: "GET"
http script set $request_method_private

Inside the /auth location block:

http script var: "GET /private/test HTTP/1.1"
http script set $request_auth

http script var: "/private/test"
http script set $request_uri_auth

http script var: "GET"
http script set $request_method_auth

Here's the gdb session output:

322 sr->header_only = 1; /* <— Look, it's your old
friend! */
(gdb)
324 ctx->subrequest = sr;
(gdb)
326 ngx_http_set_ctx(r, ctx, ngx_http_auth_request_module);
(gdb)
328 return NGX_AGAIN;

(gdb) print &r->request_line
$1 = (ngx_str_t *) 0x48b06d88 <— Take a good look.
(gdb) print r->request_line
$2 = {len = 26, data = 0x48b67000 "GET /private/test
HTTP/1.1\r\nUser-Agent"}
^^^^^^^^^^- Take a good look at this, too.

(gdb) print &sr->request_line
$3 = (ngx_str_t *) 0x48b96874 <— Does that look like 0x48b06d88 to
you?
(gdb) print sr->request_line
$4 = {len = 27, data = 0x48b96c64 "HEAD /private/test
HTTP/1.1"}
^^^^^^^^^^- Does that look like 0x48b67000 to
you?

(gdb) print sizeof("GET /private/test HTTP/1.1")-1
$5 = 26
(gdb) print sizeof("HEAD /private/test HTTP/1.1")-1
$6 = 27

(gdb) print r->method
$7 = 2
(gdb) print r->method_name
$8 = {len = 3, data = 0x48b67000 "GET /private/test
HTTP/1.1\r\nUser-Agent"}

(gdb) print sr->method
$9 = 4
(gdb) print sr->method_name
$10 = {len = 4, data = 0x81733f2 "HEAD "}

In your previous reply to my post you wrote:

> If you handle auth subrequests with proxy_pass, you may use
> proxy_set_method to issue HEAD requests to backend. Or you may
> use correct auth endpoint which doesn't return unneeded data.

And in your latest reply you wrote:

> Again: the sr->header_only workaround is anyway required, as
> static file, or memcached, or fastcgi may be used as handler for
> auth subrequest (and, actually, even some broken http backends may
> return data to HEAD, not even talking about intended changes of
> proxy_method). Without sr->header_only explicitly set you
> will get response content before headers of the real response:

First you argued that one should use "correct" authentication
endpoints
that do not return unneeded data, and now you're trying to make it
seem
like your workaround is there to allow the auth_request module to
work with "incorrect" authentication endpoints that violate
HTTP itself
and other established protocols, but the main reason your workaround
is there has to do with the fact that you don't know how to handle
the request body without crashing nginx. That is also why you need
to redesign your module if you want it to work with caching.

> And, BTW, as far as I recall your code, it won't set
> sr->header_only in case of HEAD requests. This is wrong, you
> still need it even for HEADs.
>
> > I left your workaround the way it is to prevent people from
shooting
> > themselves in the foot by setting the proxy method to GET, but
those
> > who know about the proxy_method directive (BTW, you got the
name wrong,
> > there is no proxy_set_method directive), should know what they
are doing.

You're wrong, AGAIN. I left the workaround the way it is - do I hear
an echo? Make a claim. Have your claim proven wrong by the next
paragraph you quote. Keep the wrong claim and the quoted paragraph
that proves you wrong in your reply anyway. Priceless.

> Your patch tries to make the workaround a bit more smart, and
> tries to make arbitrary configuration more efficient, but this is
> wrong aproach: instead, it should be made less intrusive.
>
> The major problem with the workaround as of now is that it
> prevents caching. And this should be addressed.

My patch solves the GET initiated flood problem nicely without
breaking anything, but it does not address the broken design
of your module that's preventing caching in the first place.

So instead of trying to prove me wrong and getting yourself
proven wrong again and again (which you might find personally
intrusive and embarrassing), maybe you could address the broken
design of your module?

> On the other hand, setting method/request line increase comlexity
> and overhead in normal case, as well as subject to bugs (at least
> two were identified above).

Are you serious? You're comparing the complexity and overhead of
completely in-memory operations (memory allocation, assignment,
request line scanning and two memcpy() calls) on a few dozen
bytes to reading from disk, processing, buffering and sending
hundreds of kilobytes on a closed connection?

Just to put things in perspective I did some profiling with
nanosecond resolution and it turns out that on the oldest single-core
CPU server I have access to, my patch adds 2550 ns (on average) to the
overall processing time. Using the memchr() function instead of the
ngx_strlchr() function to scan for the space character and scanning
only up to the 8th byte (because currently the longest HTTP method
request name is 7 characters long) gives roughly the same results
(2600 ns on average), while using an optimized for loop directly
shaves off another 200 ns, on average. Even without any
micro-optimization, my patch adds less than 3 microseconds
(0.003 ms) to the overall processing time on a very old server
that you're unlikely to find in any production environment.
3000 nanoseconds. Horrible, isn't it?

As for bugs, please do feel free to point them out.

> Hope my position is clear enough

It's clear that you keep making unsubstantiated claims that keep
getting proven wrong, and that you'd rather spend 10 minutes
writing a post to try to make it look like I broke your beloved
module with my patch instead of taking 5 minutes to apply my patch,
recompile nginx, and step through the patched function in a debugger
to see what my patch really does, because you obviously do not
understand what is going on in there.

You seem to prefer conjecture to verifiable facts, so I see no point in
discussing this further.

Max

xamdam · February 17, 2012, 12:19pm

Hello!

On Fri, Feb 17, 2012 at 12:28:14PM +0400, Max wrote:

Hello!

(especially on FreeBSD), but I think a lot of people are inherently
I took a closer look at the auth_request module source code today and

proxy_set_method to issue HEAD requests to backend. Or you may

https://nginxyzpro.berlios.de/patch-head.ngx_http_auth_request_module.c.20120215.diff

The patch is wrong by design, see above. Moreover, it makes it
impossible to correctly pass original request method to auth
endpoint.

Maxim, you haven’t even taken a look at my patch, have you? Because
if you had, you wouldn’t have made such unsubstantiated claims.

I have, despite the fact that it was provided as a link only.

[…]

Moreover, you are wrong about my patch making it impossible
to correctly pass the original request method to authentication
endpoints. The original request is fully preserved (along with
the entire original request) and can be accessed through the
$request_method variable inside the subrequest location block.
Feel free to verify this for yourself, if you don’t believe me:

I stand corrected. Your patch broke only $request variable, not
the $request_method (which always comes from the main request).

[…]

The same can be achieved by using the proxy_method directive
(“proxy_method HEAD;”), but I believe the HEAD method should be used
by default. Why? Because, IMNSHO, it does everything the GET method does
(including Basic access authentication WWW-Authenticate / Authorization
header exchange), but in a much more elegant and efficient way, which
also makes your sr->header_only=1 workaround unnecessary.

Again: the sr->header_only workaround is anyway required, as
static file, or memcached, or fastcgi may be used as handler for
auth subrequest (and, actually, even some broken http backends may
return data to HEAD, not even talking about intended changes of
proxy_method). Without sr->header_only explicitly set you
will get response content before headers of the real response:

HEAD / HTTP/1.0

BOOM
HTTP/1.1 200 OK
Server: nginx/1.1.15
Date: Fri, 17 Feb 2012 10:11:35 GMT
Content-Type: text/html
Content-Length: 1047
Connection: close
Last-Modified: Mon, 13 Feb 2012 01:20:52 GMT
Accept-Ranges: bytes

The “BOOM” string is from static file used as auth_request
handler. (Obviously the sr->header_only workaround was removed
for testing.)

If the question was about proxy only, I wouldn’t added the
workaround in the first place, but recommended proxy_method in
docs instead.

And, BTW, as far as I recall your code, it won’t set
sr->header_only in case of HEAD requests. This is wrong, you
still need it even for HEADs.

I left your workaround the way it is to prevent people from shooting
themselves in the foot by setting the proxy method to GET, but those
who know about the proxy_method directive (BTW, you got the name wrong,
there is no proxy_set_method directive), should know what they are doing.

As I already said before, the whole sr->header_only thing is a
workaround to prevent people from unintentionally breaking
protocol.

Your patch tries to make the workaround a bit more smart, and
tries to make arbitrary configuration more efficient, but this is
wrong aproach: instead, it should be made less intrusive.

The major problem with the workaround as of now is that it
prevents caching. And this should be addressed.

BTW, the proxy_method directive is missing from the official
documentation, so you may want to fix that:

Module ngx_http_proxy_module

This will be addressed in near future.

[…]

I thought nginx was supposed to be about efficiency.
Both using proxy_method and using auth endpoint which doesn’t
return data do the same if you are talking about efficiency.

On the other hand, setting method/request line increase comlexity
and overhead in normal case, as well as subject to bugs (at least
two were identified above).

Hope my position is clear enough.

[…]

Maxim D.

xamdam · February 19, 2012, 4:08am

Hello!

On Sun, Feb 19, 2012 at 05:12:17AM +0400, Max wrote:

to indicate your download request had header_only set to 1.
Posting patches in the message itself makes review much easier.
And you may also want to fix your email client to don’t use html
escaping in text mesages.

> I stand corrected. Your patch broke only $request variable, not
> the $request_method (which always comes from the main request).

You're wrong, my patch did not break anything. Go ahead and put
any variables in those /private/ and /auth location blocks and
see for yourself. I know it must be difficult to decipher, but
could you try to guess what the following part of my patch does?

It looks like you don’t understand how variables work. Try the
following without your patch and with it:

log_format test "request: $request";
access_log /path/to/log test;

location / {
    auth_request /auth;
}

location = /auth {
    proxy_pass http://some_auth_backend;
    proxy_set_header X-Original-Request $request;
}

[…]

location /private/ {
auth_request /auth;
set $request_private $request;

This will calculate $request and cache it forever. As this will
happen in main request context - everything will be good, i.e.
original request line will be used everywhere.

On the other hand, if $request calculation will happen in auth
subrequest with your patch - the modified request line will be
cached for the rest of request processing.

[…]

> On the other hand, setting method/request line increase comlexity
> and overhead in normal case, as well as subject to bugs (at least
> two were identified above).

Are you serious? You're comparing the complexity and overhead of
completely in-memory operations (memory allocation, assignment,
request line scanning and two memcpy() calls) on a few dozen
bytes to reading from disk, processing, buffering and sending
hundreds of kilobytes on a closed connection?

It’s sad you don’t understand what I wrote. In the normal case
there are no overhead for extra bytes sent, but there is
overhead for extra processing. But it doesn’t really matter
though, it’s complexity (and bugs) which really matters.

And, just to make things more clear, complexity is about code
complexity which makes maintanance, debugging and further
modifications harder.

[…]

Maxim D.