Centralized logging for multiple servers


#1

Hi guys,

Just wondering what you guys are using for centralized logging of access
logs for multiple servers. I’m thinking about using the syslog patch but
I
was wondering if anyone was using anything they think is better? And has
piping of log files been implemented in the newer versions?

Kingsley F.
Technical Leader Content Services / Content Services Group

=============================================
Internode Systems Pty Ltd

PO Box 284, Rundle Mall 5000
Level 5 150 Grenfell Street, Adelaide 5000
Phone: +61 8 8228 2978
Fax: +61 8 8235 6978
Web: http://www.internode.on.net
http://games.on.net


#2

Kingsley F. wrote:

Just wondering what you guys are using for centralized logging of access
logs for multiple servers. I’m thinking about using the syslog patch but
I was wondering if anyone was using anything they think is better? And
has piping of log files been implemented in the newer versions?

  1. syslog and log to pipe is not usable with high-loaded web server.

  2. If you want to send logs to other host via syslog-ng don’t forget,
    that some records will be
    lost in case of network failure (longer than kernel tcp retransmit
    timeout), and in case of log
    server reboot.

  3. IMHO most usable for loaded servers way to store all logs on one
    server is:
    3.1 rotate logs as often as need.
    3.2 share logs by http using same nginx (protected by password and
    IP ACL).
    3.3 on central server run script to fetch this logs (e. g. using
    wget).

If you want near-realtime logs on central server you can run:
tail -F /path/to/access.log | soft-to-send-logs-to-remote-server


#3

On Monday, April 13, 2009 at 21:47:18, Anton Y. wrote:

AY> If you want to send logs to other host via syslog-ng
AY> don’t forget, that some records will be lost in case of
AY> network failure (longer than kernel tcp retransmit timeout),
AY> and in case of log server reboot.

may be http://developers.facebook.com/scribe/ can help.

Scribe is a server for aggregating log data streamed
in real time from a large number of servers. It is designed
to be scalable, extensible without client-side modification,
and robust to failure of the network or any specific machine.


#4

Awstats comes with a small perl file to merge multiple log files which
I have used in the past

/usr/bin/logresolvemerge.pl

Cheers

Dave


#5

mergelog can do that.

100 solutions to a problem, non of them great :frowning:

Kingsley


From: “Glen L.” removed_email_address@domain.invalid
Sent: Wednesday, April 15, 2009 10:53 PM
To: removed_email_address@domain.invalid
Subject: Re: Centralized logging for multiple servers


#6

What about

cat *.log | sort -k 4


#7

On Mon, Apr 13, 2009 at 12:56:29PM +0930, Kingsley F. wrote:

Hi guys,

Just wondering what you guys are using for centralized logging of access
logs for multiple servers. I’m thinking about using the syslog patch but I
was wondering if anyone was using anything they think is better? And has
piping of log files been implemented in the newer versions?

I personally do not like syslog and log piping.

I prefer to writing to local file system and to scp logs at midnight to
central host.

Some time ago I needed to deliver logs for hourly statistics. I have
made
script that rotates log every hour, then gzips it (it takes several
seconds as a whole ~1G hourly log file is in VFS cache), and sends
the log to host. A daily log is done as

zcat hourly-logs.gz | 7z > daily-log.7z

and then it is copied to central host.


#8

I prefer to writing to local file system and to scp logs at midnight
to
central host.

I’m doing the same thing, but in the process of switching from syslog
to syslog-ng, due to the fact that one can use tcp with guaranteed
delivery. Also, you can log to multiple destinations at once (local
and multiple remote for redundancy purposes). Some customers have
requested ‘near-realtime’ stats on their usage, and this is the only
way to do it (the remote hosts are expendable and don’t actually do
anything with the data besides show running stats).

Cheers
Kon


#9

On Wed, Apr 15, 2009 at 7:06 AM, Dave C. removed_email_address@domain.invalid wrote:

What about

cat *.log | sort -k 4

or just

cat *whatever.log >today.log

I assume the processing script can handle out-of-order requests. but I
guess that might be an arrogant assumption. :slight_smile:

I do basically the same thing igor does, but would love to simplify it
by just having Host: header counts for bytes (sent/received/total
amount of bytes used, basically) and how many http requests. Logging
just enough of that to a file and parsing it each night seems kinda
amateur…


#10

Does this scale well? I’m running a web based proxy that generates an
absolute ton of log files. Easily 40gb / week / server, with around 20
servers. I’m looking to be able to store and search up to 7 days of
logs. Currently, I only move logs from the individual servers onto a
central server when I get a complaint, import it into mysql, and
search it. The entire process, even for just one server, takes
forever.


#11

I’m by no means a splunk expert, you should ask them, but I think it
scales pretty well. You can use multiple masters to receive and
load-balance logs, and you can distribute the searching map/reduce
style to leverage more cores. Search speed seems to be much more CPU
bound than I/O bound, the logs are pretty efficiently packed. Works
for me
with ~ 15-20 EC2 instances and one central logging server. It
also keeps logs in tiered buckets, so things from 30 days ago are
available, but slower to search on where as yesterday’s logs are
‘hotter’.


#12

Its commercial, but Splunk is amazing at this. I think you can process
a few hundred MB/day on the free version. http://splunk.com/

You set up a light-weight forwarder on every node you are interested
in, and then it slurps the files up and relays them to a central
splunk installation. It will queue internally if the master goes away.
Tons of support for sending different files different directions etc.
We have it setup in the default Puppet payload so every log on every
server is always centralized and searchable.


#13

I’ve used spread for centralized logging before, it’s a horrible
clusterF. If you want details as to why, let me know.


#14

On Fri, Apr 17, 2009 at 3:32 AM, Gabriel R. removed_email_address@domain.invalid
wrote:

I’ve used spread for centralized logging before, it’s a horrible
clusterF. If you want details as to why, let me know.

no thanks. i have no experience with it and i don’t really need
something like it right now (hopefully i’d use gearman, as i talk with
the main guy behind it a lot now) - it’s not a 1:1 concept-wise, but
you can probably get gearman to do what you’d want spread to do.

i’ll take your word for it since i’ve never bothered. it just looked
cool :slight_smile:


#15

if just looking for some sort of distribution you could put stuff into
memcached, mysql cluster, look at spread toolkit (spread.org i
believe) and gearman…


#16

It sounds cool from a theoretical perspective. I read a book, scalable
internet architectures, which champions the cause of spread for all
kinds of uses. Getting the thing working, however, is not worth the
effort. And trying to pump a lot of raw log files through it in real
time, yeah, don’t bother giving yourself that headache.


#17

On Fri, Apr 17, 2009 at 5:07 PM, Gabriel R. removed_email_address@domain.invalid
wrote:

I was able to use the wayback machine to find the most recent pricing
for splunk. It seems that 1gb / day license costs $10k and 10gb / day
of log volume is going to set you back $30k. Above that and you have
to ask them for pricing. That’s really not going to work seeing as how
I’m doing more like 100gb / day. Their current website doesn’t have
any prices at all and just asks that you contact their sales
department.

The trick is to reduce your log volume. I use a number of parsers that
filter and summarize logs before pushing them to the central NMS
server and placing them into the splunk queue. Works perfectly and if
there is a need to analyze a specific problem we can always go back to
the machine with the source logs for further investigation.

Cheers
Kon


#18

If I have to do a lot of processing to reduce my log volume, and then
go back to the raw logs in case I actually needed the data, is there
really a lot of benefit to using splunk in the first place?


#19

Hi.

the first thing you have to deal with is multicast. It’s designed so
that you can multicast your data to multiple servers without having to
specify several destinations and send the traffic to each separately.
However, if your switches / routers don’t support multicast, it will
automatically fall back to broadcasting, even if you only have one
server as the destination for the data. My web hosting provider
quickly cut off the server that was broadcasting so much data to
everyone on the subnet. Broadcasts, since they go to everyone, can
slow down all the servers on your network as they all have to decide
what to do with the packets they’re receiving. you can override this
behavior to singlecast if you only have one destination server, but
not if you have more than one.

secondly is the problem of getting it up and running in the first
place, setting up your various variables, compiling and installing,
which is not particularly simple.

Then you have to decide, how is my data going to get into spread?
You’re going to need a program that sends the data into spread. Maybe
you’ll use a perl script, that seems to be popular. One way to do that
is to pipe your log file output to a perl script. But what happens
when the perl script unexpectedly dies? The application that is piping
out to the script fails. You’ll have to notice that this has occurred,
and kill your hosting software (nginx, apache, squid, whatever) as
well as the perl script, and restart the perl script and then your
hosting software, in that order. Your perl script can die for all
sorts of reasons, not least of which that it lost contact with the
spread server for too long, or it’s queue of messages to send across
the pipe got too long. Yeah, definitely what you want when you’ve got
a backlog of 10,000 requests to send across is to lose them all when
your logging program crashes.

Ok so another method is to have your hosting platform log to a file as
normal, but have your perl script attach to the file with something
like tail -f | whatever.pl. That can work, but it suffers the same
problems with dying unexpectedly, the only difference being that when
the perl script dies, logging to the file continues and the hosting
platform doesn’t unexpectedly die.

the other issue is performance. the receiving spread server has to be
able to process all these incoming “messages” as they’re called from
all your servers, do something with them, and be available to receive
more messages, without crashing. Again, this is probably a perl
script. And, again, I had no real trouble creating a huge load on the
receiving spread server when it was receiving real time log data from
just one server. I have 20. If I was lucky, I could have gotten it
doing 2 or 3 servers of realtime log data. 20 was not going to happen.

The whole thing with spread is that, in theory, it is designed to be
robust, but in my experience it is far from it, the whole operation
seemed quite fragile. It looked like the amount of effort I was going
to have to put in to write programs to make sure that spread was
working properly, and to work around potential failure conditions in
an elegant way, was obscene. I’m sure spread has a number of good
uses, but I could not recommend it for centralized logging. It does
look interesting for a program called whack-a-mole, which is designed
to help you set up your servers in high availability, but that
requires a lot fewer messages flying around than log files would be.

SCP’ing or rsyncing a file from your source server to your centralized
logging server is a lot more robust. Those transfer programs have a
number of protections to make sure the file got there intact, can do
compression, whatever. And you don’t have to transfer your log files
line-by-line with scp or rsync, you can dump huge amounts of data
across, and then have your central log server process them at it’s
leisure. If it can’t keep up with peak demand, it can catch up when
the site isn’t as busy, so you don’t have an end-of-the-world scenario
if the centralized log server can’t keep up with the generation of
real time logs.


#20

On Sat, Apr 18, 2009 at 2:22 AM, Gabriel R. removed_email_address@domain.invalid
wrote:

If I have to do a lot of processing to reduce my log volume, and then
go back to the raw logs in case I actually needed the data, is there
really a lot of benefit to using splunk in the first place?

Depends on who your splunk users are and how important the extraneous
data is. If it is tech support staff then it is still invaluable at
being able to give them a mid/high level overview of any outages or
problems with customer accounts (since they may not be able to fix the
underlying problem anyway). If you have hundreds of accounts and
servers offering multiple services, it is a big help. And many times
there is no need to log all the data on the system, e.g. with a lot of
rsync jobs you really don’t need the rsync logging output – only if
the job was successful. Similarly with sync jobs that run every 5
minutes, I don’t log success; only failure. The list goes on…

Cheers
Kon