Escape characters in log files

nginx does not escape logged variables. This makes it impossible to
reliably parse log entries. This is what Apache does:

For security reasons, starting with version 2.0.46, non-printable
and other special characters in %r, %i and %o are escaped using
\xhh sequences, where hh stands for the hexadecimal representation
of the raw byte. Exceptions from this rule are " and , which are
escaped by prepending a backslash, and all whitespace characters,
which are written in their C-style notation (\n, \t, etc). In
versions prior to 2.0.46, no escaping was performed on these strings
so you had to be quite careful when dealing with raw log files.

mod_log_config - Apache HTTP Server Version 2.2

I would like to fix this in nginx. What is the best way to handle this
issue?

David P. ha scritto:

nginx does not escape logged variables. This makes it impossible to
reliably parse log entries. This is what Apache does:

Can you post an example, thanks?

I would like to fix this in nginx. What is the best way to handle this issue?

Modify the http log module.

I think you only need to modify the ngx_http_log_variable_getlen and
ngx_http_log_variable functions.

Manlio P.

On Jan 15, 2008 2:48 AM, Manlio P. [email protected]
wrote:

Can you post an example, thanks?

Request:

GET / HTTP/1.0
Referer: xxx"yyy

nginx log:

127.0.0.1 - - [16/Jan/2008:18:58:03 -0600] “GET / HTTP/10” 403 202
“xxx"yyy” “-”

Apache log:

127.0.0.1 - - [16/Jan/2008:18:58:03 -0600] “GET / HTTP/1.0” 403 202
“xxx"yyy” “-”

Modify the http log module.

I think you only need to modify the ngx_http_log_variable_getlen and
ngx_http_log_variable functions.

You’re right, it’s simpler than I thought. I was concerned about the
performance overhead of escaping variables that are known to be good,
but they are already special-cased by ngx_http_log_vars.

The attached patch escapes characters in log file entries. It works
similar to Apache:

  • Any non-printable characters (<= 31 or >= 127) or pipe characters
    are escaped using \xhh
  • Whitespace (tab, newline, vertical tab, form feed, carriage return)
    characters use C-style escapes
  • Double quotes or backslashes are prefixed with a backslashes

Pipes are escaped to allow for easy handling of raw logs when using
pipes as the delimiter: you can split on a pipe without any parsing.

(re-sending with the patch actually attached)

The attached patch escapes characters in log file entries. It works
similar to Apache:

  • Any non-printable characters (<= 31 or >= 127) or pipe characters
    are escaped using \xhh
  • Whitespace (tab, newline, vertical tab, form feed, carriage return)
    characters use C-style escapes
  • Double quotes or backslashes are prefixed with a backslashes

Pipes are escaped to allow for easy handling of raw logs when using
pipes as the delimiter: you can split on a pipe without any parsing.

David P. ha scritto:

Pipes are escaped to allow for easy handling of raw logs when using
pipes as the delimiter: you can split on a pipe without any parsing.

Thanlks for the patch.
I suggest you to add a new directive to enable/disable escaping.

One last thing: can you please add this patch to
http://wiki.codemongers.com/NginxModules, in the “Third-Party Nginx
patches” section?

Manlio P.