40 bad request and UTF8

Hi,

Im using nginx and rails for my site which contains url with georgian
letters ie განცხადებები so something like

It is mainly working perfectly but sometimes I receive request with
truncated url ie
1 -
http://gancxadebebi.ge/ka/განცხადებებ��%9
(as u can see it should be something after %9)
or
2 -
http://gancxadebebi.ge/ka/განცხადებები?mc=mini+aipadi&search=ძიებ��%9

I succeeded to deal when there is no get parameters (first url above)
and
make in that case a redirection to /
when this happen, this line is added to nginx error.log
2013/09/24 00:46:53 [alert] 63547#0: *19359227 pcre_exec() failed: -10
on
“/ka/განცხადებებ�” using “”, client: aa.bb.cc.dd, server:
gancxadebebi.ge,
request: “GET
/ka/%E1%83%92%E1%83%90%E1%83%9C%E1%83%AA%E1%83%AE%E1%83%90%E1%83%93%E1%83%94%E1%83%91%E1%83%94%E1%83%91%E1%8
HTTP/1.1”, host: “gancxadebebi.ge”

but for second url, which have get parameter truncated, I can not handle
that which generate a 400 bad request page.
such request added this line in nginx access.log
aa.bb.cc.dd - - [24/Sep/2013:00:48:47 +0200] “GET
/ka/%E1%83%92%E1%83%90%E1%83%9C%E1%83%AA%E1%83%AE%E1%83%90%E1%83%93%E1%83%94%E1%83%91%E1%83%94%E1%83%91%E1%83%98?mc=mini+aipadi&search=%E1%83%AB%E1%83%98%E1%83%94%E1%83%91%E1%83%
HTTP/1.1” 400 5 “-” “Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36”

does this mean that nginx accepted the request and then rails coudnt
resolve
it ?

I don’t know if problem come from rails or from nginx. For first url, I
solved it in nginx conf
here part of my conf

access_log /var/log/nginx/gancx.access.log;
error_log /var/log/nginx/gancx.error.log;

client_body_in_file_only clean;
client_body_buffer_size 32K;
charset UTF-8;
source_charset UTF-8;
client_max_body_size 300M;



error_page  400 404         = @notfound;
error_page  500 502 504 = @server_error;
error_page  503         = @maintenance;

location @notfound {
  rewrite ^(.*)$ $scheme://$host permanent;
}

location @server_error {
    rewrite ^(.*)$ $scheme://$host permanent;
}

location @maintenance {
    rewrite ^(.*)$ $scheme://$host permanent;
}
sendfile on;
send_timeout 300s;

location / {
    proxy_pass http://gancx;
    proxy_redirect off;

    proxy_set_header   Host             $host;
    proxy_set_header   X-Real-IP        $remote_addr;
    proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
    charset UTF-8;
    client_max_body_size 7m;
    proxy_buffer_size          4k;
    proxy_buffers              4 32k;
    proxy_busy_buffers_size    64k;
    proxy_temp_file_write_size 64k;
}

thanks for your help

Posted at Nginx Forum:

Hello!

On Wed, Sep 25, 2013 at 08:13:11AM -0400, optimum.dulopin wrote:

or
2 -

http://gancxadebebi.ge/ka/განცხადებები?mc=mini+aipadi&search=ძიებ��%9

I succeeded to deal when there is no get parameters (first url above) and
make in that case a redirection to /

Hmm, I tend to think it’s a bug that (1) doesn’t generate 400 Bad
Request. It should.

when this happen, this line is added to nginx error.log
2013/09/24 00:46:53 [alert] 63547#0: *19359227 pcre_exec() failed: -10 on
“/ka/განცხადებებ�” using “”, client: aa.bb.cc.dd, server: gancxadebebi.ge,
request: "GET

/ka/%E1%83%92%E1%83%90%E1%83%9C%E1%83%AA%E1%83%AE%E1%83%90%E1%83%93%E1%83%94%E1%83%91%E1%83%94%E1%83%91%E1%8

HTTP/1.1", host: “gancxadebebi.ge”

The -10 from pcre_exec() is PCRE_ERROR_BADUTF8, it shouldn’t
happen unless you’ve explicitly used “(*UTF8)” in your PCRE
patterns. It’s very strange you see it with the config provided.

but for second url, which have get parameter truncated, I can not handle
that which generate a 400 bad request page.
such request added this line in nginx access.log
aa.bb.cc.dd - - [24/Sep/2013:00:48:47 +0200] "GET

/ka/%E1%83%92%E1%83%90%E1%83%9C%E1%83%AA%E1%83%AE%E1%83%90%E1%83%93%E1%83%94%E1%83%91%E1%83%94%E1%83%91%E1%83%98?mc=mini+aipadi&search=%E1%83%AB%E1%83%98%E1%83%94%E1%83%91%E1%83%

HTTP/1.1" 400 5 “-” “Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36”

does this mean that nginx accepted the request and then rails coudnt resolve
it ?

By itself nginx doesn’t try to urldecode request arguments (in
contrast to URI path, which is urldecoded for location matching),
and because of this it doesn’t try to detect encoding violations
in request arguments. That is, most likely you are right and the
error comes from your backend.

You may try intercepting errors using proxy_intercept_errors, but
actually I wouldn’t recommend doing it. Configuring an error_page
for 400 Bad Request isn’t a good idea, it might hurt.


Maxim D.
http://nginx.org/en/donation.html

p.s. Please don’t duplicate the same question to the same mailing
list via multiple forum-like interfaces. It’s still the same
mailing list. Thank you for cooperation.