Nginx+lua reverse proxy empty body

Hi all

I’m trying to do on-the-fly changes on the pages of a site using lua.
I’ve set up a nginx reverse proxy, and some lua code to do the
replacements, and I notice irreproducable (timing ?) situations where
the proxied body that is passed to lua is empty. I know my code works in
some cases, but I can’t figure out what makes that it’s not reliable.

nginx.conf:

worker_processes 1;
error_log logs/error.log debug;
events {
worker_connections 1024;
}
http {
server {
client_body_in_single_buffer on;
listen 9001;

     location / {
         proxy_pass http://www.spelletjes.nl:80;
         proxy_set_header X-Real-IP $remote_addr;
         body_filter_by_lua '

if ngx.arg[1] ~= “” then
ngx.arg[1] = string.gsub(ngx.arg[1], “Speel”, “NGINX”)
else
print(ngx.var.uri … " has empty body" … ngx.arg[1])
end
';
}
}
}
The problem I have basically that the ngx.arg[1] is an empty string
(sometimes, timing dependent?) on url’s that are definitely not empty.

So what am I doing wrong? I am using openresty 1.2.4.9 (nginx 1.2.4 +
ngx_lua-0.7.5)

Typical message in logs/error.log:
67 2012/11/26 14:53:59 [notice] 19291#0: *55 [lua] [string
“body_filter_by_lua”]:7: / has empty body while sending to
client, client: 127.0.0.1, server: , request: “GET / HTTP/1.1”,
upstream: "http://212.72.60.220:80/ ", host: “localhost:9001”

Thanks for answers

Bart

Hello!

On Tue, Nov 27, 2012 at 2:28 AM, Bart van Deenen wrote:

The problem I have basically that the ngx.arg[1] is an empty string
(sometimes, timing dependent?) on url’s that are definitely not empty.

It is normal that ngx.arg[1] is an empty string in the body filters
when the upstream module generates “pure special bufs” like those with
only the “last_buf” flag set (i.e., the eof flag set on the Lua land).

It’s normal that for a given response, the output body filter gets
called multiple times because that’s exactly how streaming processing
works in Nginx (you surely do not want to buffer all the data at a
time for huge responses).

And the response body may be fed into your body filter in multiple
data chunks. You should always be prepared for that in your Lua code.

Please refer to the documentation for body_filter_by_lua for more
information:

http://wiki.nginx.org/HttpLuaModule#body_filter_by_lua

BTW, doing simple regex match in body filters may not always work as
expected because the nginx upstream module may split the response body
into chunks in an arbitrary way (e.g., splitting in the middle of the
word “Speel”, for example).

I’ve been working on the sregex C library that will support streaming
match just like Ragel:

https://github.com/agentzh/sregex

It’s still in progress though but it’ll soon be usable on the Lua land
:slight_smile:

Best regards,
-agentzh

Hi Agentz

But wouldn’t the statement
client_body_in_single_buffer on;
cause the whole body of the proxied server to go into ngx.arg[1] ?

And I also don’t understand that my example code shouldn’t work
reliably, even if the proxied data is passed through it in chunks
(unless the chunk boundary would accidentally be right in the middel of
my short match string). I’ve done a very similar setup proxying and
modification of a simple website (vandeenensupport.com), and that works
perfectly.

I have also noticed that when I add a ‘print(ngx.arg[1])’ in the first
line of the lua section of my example, the html replacement works
reliably, no more empty ngx.arg[1]!
But that print only goes into the nginx logging, so maybe it’s only its
timing that has some effect?

So I still don’t understand it.

Thanks for all your good work on nginx.

Bart


From: [email protected] [[email protected]] on behalf of
agentzh [[email protected]]
Sent: Wednesday, November 28, 2012 12:43 AM
To: [email protected]
Subject: Re: nginx+lua reverse proxy empty body

Hello!

On Tue, Nov 27, 2012 at 2:28 AM, Bart van Deenen wrote:

The problem I have basically that the ngx.arg[1] is an empty string
(sometimes, timing dependent?) on url’s that are definitely not empty.

It is normal that ngx.arg[1] is an empty string in the body filters
when the upstream module generates “pure special bufs” like those with
only the “last_buf” flag set (i.e., the eof flag set on the Lua land).

It’s normal that for a given response, the output body filter gets
called multiple times because that’s exactly how streaming processing
works in Nginx (you surely do not want to buffer all the data at a
time for huge responses).

And the response body may be fed into your body filter in multiple
data chunks. You should always be prepared for that in your Lua code.

Please refer to the documentation for body_filter_by_lua for more
information:

http://wiki.nginx.org/HttpLuaModule#body_filter_by_lua

BTW, doing simple regex match in body filters may not always work as
expected because the nginx upstream module may split the response body
into chunks in an arbitrary way (e.g., splitting in the middle of the
word “Speel”, for example).

I’ve been working on the sregex C library that will support streaming
match just like Ragel:

https://github.com/agentzh/sregex

It’s still in progress though but it’ll soon be usable on the Lua land
:slight_smile:

Best regards,
-agentzh


nginx mailing list
[email protected]
http://mailman.nginx.org/mailman/listinfo/nginx

Hello!

On Wed, Nov 28, 2012 at 7:28 AM, Bart van Deenen wrote:

Hi Agentz

Agentz is not my name, don’t call me that. You can either call me
agentzh or Yichun.

But wouldn’t the statement
client_body_in_single_buffer on;
cause the whole body of the proxied server to go into ngx.arg[1] ?

client_body_in_single_buffer is for request bodies while
body_filter_by_lua is for response bodies. Please do not confuse
these two bodies. They’re completely different things.

And I also don’t understand that my example code shouldn’t work reliably, even
if the proxied data is passed through it in chunks (unless the chunk boundary
would accidentally be right in the middel of my short match string).

Yes, I mean exactly the case that the chunk boundary is in the middle
of your string. It could happen.

I’ve done a very similar setup proxying and modification of a simple website
(vandeenensupport.com), and that works perfectly.

Working 99.9% of the time can never imply 100% perfection :slight_smile: This is
just a caveat :slight_smile:

I have also noticed that when I add a ‘print(ngx.arg[1])’ in the first line of
the lua section of my example, the html replacement works reliably, no more empty
ngx.arg[1]!

ngx.arg[1] could be an empty string by design, as explained in my
previous email. Always be prepared for that if you want your code
works reliably.

You can always reproduce a “special buf” (with empty data chunk) with
ngx_lua’s ngx.flush() and ngx.eof() primitives.

But that print only goes into the nginx logging, so maybe it’s only its timing
that has some effect?

Maybe.

Best regards,
-agentzh