Forum: NGINX nginx+lua reverse proxy empty body

Posted by Bart van Deenen (Guest)
on 2012-11-27 11:29
(Received via mailing list)
Hi all

I'm trying to do on-the-fly changes on the pages of a site using lua.
I've set up a nginx reverse proxy, and some lua code to do the
replacements, and I notice irreproducable (timing ?) situations where
the proxied body that is passed to lua is empty. I know my code works in
some cases, but I can't figure out what makes that it's not reliable.

nginx.conf:

worker_processes  1;
error_log logs/error.log debug;
events {
     worker_connections 1024;
}
http {
     server {
         client_body_in_single_buffer on;
         listen 9001;

         location / {
             proxy_pass http://www.spelletjes.nl:80;
             proxy_set_header X-Real-IP $remote_addr;
             body_filter_by_lua '

if ngx.arg[1] ~= "" then
     ngx.arg[1] = string.gsub(ngx.arg[1], "Speel", "NGINX")
else
     print(ngx.var.uri .. " has empty body" .. ngx.arg[1])
end
             ';
         }
     }
}
The problem I have basically that the ngx.arg[1] is an empty string
(sometimes, timing dependent?) on url's that are definitely not empty.


So what am I doing wrong? I am using openresty 1.2.4.9 (nginx 1.2.4 +
ngx_lua-0.7.5)

Typical message in logs/error.log:
   67 2012/11/26 14:53:59 [notice] 19291#0: *55 [lua] [string
"body_filter_by_lua"]:7: / has empty body while             sending to
client, client: 127.0.0.1, server: , request: "GET / HTTP/1.1",
upstream: "http://212.72.60.220:80/      ", host: "localhost:9001"


Thanks for answers

Bart
Posted by agentzh (Guest)
on 2012-11-28 00:44
(Received via mailing list)
Hello!

On Tue, Nov 27, 2012 at 2:28 AM, Bart van Deenen wrote:
> The problem I have basically that the ngx.arg[1] is an empty string
> (sometimes, timing dependent?) on url's that are definitely not empty.
>

It is normal that ngx.arg[1] is an empty string in the body filters
when the upstream module generates "pure special bufs" like those with
only the "last_buf" flag set (i.e., the eof flag set on the Lua land).

It's normal that for a given response, the output body filter gets
called multiple times because that's exactly how streaming processing
works in Nginx (you surely do not want to buffer all the data at a
time for huge responses).

And the response body may be fed into your body filter in multiple
data chunks. You should always be prepared for that in your Lua code.

Please refer to the documentation for body_filter_by_lua for more 
information:

    http://wiki.nginx.org/HttpLuaModule#body_filter_by_lua

BTW, doing simple regex match in body filters may not always work as
expected because the nginx upstream module may split the response body
into chunks in an arbitrary way (e.g., splitting in the middle of the
word "Speel", for example).

I've been working on the sregex C library that will support streaming
match just like Ragel:

    https://github.com/agentzh/sregex

It's still in progress though but it'll soon be usable on the Lua land 
:)

Best regards,
-agentzh
Posted by Bart van Deenen (Guest)
on 2012-11-28 16:29
(Received via mailing list)
Hi Agentz

But wouldn't the statement
          client_body_in_single_buffer on;
cause the whole body of the proxied server to go into ngx.arg[1] ?

And I also don't understand that my example code shouldn't work 
reliably, even if the proxied data is passed through it in chunks 
(unless the chunk boundary would accidentally be right in the middel of 
my short match string). I've done a very similar setup proxying and 
modification of a simple website (vandeenensupport.com), and that works 
perfectly.

I have also noticed that when I add a 'print(ngx.arg[1])' in the first 
line of the lua section of my example, the html replacement works 
reliably, no more empty ngx.arg[1]!
But that print only goes into the nginx logging, so maybe it's only its 
timing that has some effect?

So I still don't understand it.

Thanks for all your good work on nginx.

Bart

________________________________________
From: nginx-bounces@nginx.org [nginx-bounces@nginx.org] on behalf of 
agentzh [agentzh@gmail.com]
Sent: Wednesday, November 28, 2012 12:43 AM
To: nginx@nginx.org
Subject: Re: nginx+lua reverse proxy empty body

Hello!

On Tue, Nov 27, 2012 at 2:28 AM, Bart van Deenen wrote:
> The problem I have basically that the ngx.arg[1] is an empty string
> (sometimes, timing dependent?) on url's that are definitely not empty.
>

It is normal that ngx.arg[1] is an empty string in the body filters
when the upstream module generates "pure special bufs" like those with
only the "last_buf" flag set (i.e., the eof flag set on the Lua land).

It's normal that for a given response, the output body filter gets
called multiple times because that's exactly how streaming processing
works in Nginx (you surely do not want to buffer all the data at a
time for huge responses).

And the response body may be fed into your body filter in multiple
data chunks. You should always be prepared for that in your Lua code.

Please refer to the documentation for body_filter_by_lua for more 
information:

    http://wiki.nginx.org/HttpLuaModule#body_filter_by_lua

BTW, doing simple regex match in body filters may not always work as
expected because the nginx upstream module may split the response body
into chunks in an arbitrary way (e.g., splitting in the middle of the
word "Speel", for example).

I've been working on the sregex C library that will support streaming
match just like Ragel:

    https://github.com/agentzh/sregex

It's still in progress though but it'll soon be usable on the Lua land 
:)

Best regards,
-agentzh

_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Posted by agentzh (Guest)
on 2012-11-30 02:52
(Received via mailing list)
Hello!

On Wed, Nov 28, 2012 at 7:28 AM, Bart van Deenen wrote:
> Hi Agentz
>

Agentz is not my name, don't call me that. You can either call me
agentzh or Yichun.

> But wouldn't the statement
>           client_body_in_single_buffer on;
> cause the whole body of the proxied server to go into ngx.arg[1] ?
>

client_body_in_single_buffer is for *request* bodies while
body_filter_by_lua is for *response* bodies. Please do not confuse
these two bodies. They're completely different things.

> And I also don't understand that my example code shouldn't work reliably, even 
if the proxied data is passed through it in chunks (unless the chunk boundary 
would accidentally be right in the middel of my short match string).

Yes, I mean exactly the case that the chunk boundary is in the middle
of your string. It could happen.

> I've done a very similar setup proxying and modification of a simple website 
(vandeenensupport.com), and that works perfectly.
>

Working 99.9% of the time can never imply 100% perfection :) This is
just a caveat :)

> I have also noticed that when I add a 'print(ngx.arg[1])' in the first line of 
the lua section of my example, the html replacement works reliably, no more empty 
ngx.arg[1]!

ngx.arg[1] could be an empty string by design, as explained in my
previous email. Always be prepared for that if you want your code
works reliably.

You can always reproduce a "special buf" (with empty data chunk) with
ngx_lua's ngx.flush() and ngx.eof() primitives.

> But that print only goes into the nginx logging, so maybe it's only its timing 
that has some effect?
>

Maybe.

Best regards,
-agentzh
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.