Logging into script

Dennis_J · January 14, 2010, 6:13am

Hi,
Is there a nginx equivalent to apaches CustomLog directive with the “|”
prefix so it logs into stdin of another program/script? I need to do
real-time processing of the access log data and I’m wondering how I can
accomplish this once I switch to nginx.

Regards,
Dennis

Dennis_J · January 14, 2010, 7:03am

This thread might be of help:
http://nginx.org/pipermail/nginx/2009-June/013042.html

Ny the way “tail -F” is the only recommended way to do the near
real-time log parsing with nginx.

Dennis_J · January 14, 2010, 6:14am

Maybe you can use a named pipe instead?
I don’t know whether the “writer” would be blocked if there is no
reader.

2010/1/14 Dennis J. [email protected]:

nginx mailing list
[email protected]
nginx Info Page

–
Ren Xiaolei

Dennis_J · January 14, 2010, 9:37am

Not for highload.

Dennis_J · January 14, 2010, 9:12am

or create a fifo file and let nginx write his logs in the fifo. works
fine for me.

Greetings

Juergen

Dennis_J · January 14, 2010, 1:52pm

Why is logging into a pipe considered a waste of CPU?
The log parser throws away some data, aggregates the rest and then
writes
it to a remote database. The “tail -f” approach would waste lokal disk
i/o
by writing data unnecessarily to disk which i would then have to read
again
with the script.
Why is this considered more efficient than handing the data directly
over
to a script?

Regards,
Dennis

Dennis_J · January 14, 2010, 2:52pm

On Thu, Jan 14, 2010 at 01:52:02PM +0100, Dennis J. wrote:

Why is logging into a pipe considered a waste of CPU?
The log parser throws away some data, aggregates the rest and then writes
it to a remote database. The “tail -f” approach would waste lokal disk i/o
by writing data unnecessarily to disk which i would then have to read again
with the script.
Why is this considered more efficient than handing the data directly over
to a script?

It is not considered as more efficient. It may be more efficient because
of
bulk data processing. Note also, that logged data are written to disk,
but
are not read because they are already in OS cache: they are just copied.

Logging to pipe is a CPU waste because it causes a lot of context
switches
and memory copies for every log operation:

nginx writes to a pipe,
context switch to script,
script reads from the pipe,
script processes line,
script writes to a database,
context switch to nginx.

instead of single memory copy operation to a log file.

Hi,
Is there a nginx equivalent to apaches CustomLog directive with the “|” prefix so it logs into stdin of another program/script? I need to do real-time processing of the access log data and I’m wondering how I can accomplish this once I switch to nginx.

–
Igor S.
http://sysoev.ru/en/

Dennis_J · January 14, 2010, 3:49pm

On 01/14/2010 02:51 PM, Igor S. wrote:

It is not considered as more efficient. It may be more efficient because of
bulk data processing. Note also, that logged data are written to disk, but
are not read because they are already in OS cache: they are just copied.

Logging to pipe is a CPU waste because it causes a lot of context switches
and memory copies for every log operation:

Hm, interesting. I didn’t know that writing to a pipe actually forces a
context switch. I was under the impression that the writing process
could
use up it’s time slice to write an arbitrary amount of data into the
pipe
and when the OS scheduler switches to the script it would read all the
data
from that pipe.

The “tail -f” approach looks racy to me though. The log would grow
fairly
fast which means it would probably have to be rotated at least once per
hour or the disk will fill up. I’m not sure how to process this rotation
with “tail -f” without potentially missing some data.

Regards,
Dennis

Dennis_J · January 14, 2010, 4:04pm

Hello!

On Thu, Jan 14, 2010 at 03:48:35PM +0100, Dennis J. wrote:

data into the pipe and when the OS scheduler switches to the script
it would read all the data from that pipe.

The “tail -f” approach looks racy to me though. The log would grow
fairly fast which means it would probably have to be rotated at
least once per hour or the disk will fill up. I’m not sure how to
process this rotation with “tail -f” without potentially missing
some data.

tail -F will do the trick

It’s still racy as long as your app reading logs won’t be able to
cope with load and finish reading of one file before second
rotation happens. But in this case the only expected result of
piping logs directly from nginx is brick instead of server.

Maxim D.

Dennis_J · January 14, 2010, 6:03pm

----- Dennis J. [email protected] wrote:

and when the OS scheduler switches to the script it would read all the data
from that pipe.

The “tail -f” approach looks racy to me though. The log would grow fairly
fast which means it would probably have to be rotated at least once per
hour or the disk will fill up. I’m not sure how to process this rotation
with “tail -f” without potentially missing some data.

Yes. This controversy motivated me to create UDP logger.

Although there are clean ways to do this “tail -f”, there might be
demand for alternative solution.

–
Regards,
Valery K.