Hi guys, Just wondering what you guys are using for centralized logging of access logs for multiple servers. I'm thinking about using the syslog patch but I was wondering if anyone was using anything they think is better? And has piping of log files been implemented in the newer versions? Kingsley F. Technical Leader Content Services / Content Services Group ============================================= Internode Systems Pty Ltd PO Box 284, Rundle Mall 5000 Level 5 150 Grenfell Street, Adelaide 5000 Phone: +61 8 8228 2978 Fax: +61 8 8235 6978 Web: http://www.internode.on.net http://games.on.net =============================================
on 2009-04-13 07:34
on 2009-04-13 22:52
Kingsley F. wrote: > Just wondering what you guys are using for centralized logging of access > logs for multiple servers. I'm thinking about using the syslog patch but > I was wondering if anyone was using anything they think is better? And > has piping of log files been implemented in the newer versions? 1. syslog and log to pipe is not usable with high-loaded web server. 2. If you want to send logs to other host via syslog-ng don't forget, that some records will be lost in case of network failure (longer than kernel tcp retransmit timeout), and in case of log server reboot. 3. IMHO most usable for loaded servers way to store all logs on one server is: 3.1 rotate logs as often as need. 3.2 share logs by http using same nginx (protected by password and IP ACL). 3.3 on central server run script to fetch this logs (e. g. using wget). If you want near-realtime logs on central server you can run: tail -F /path/to/access.log | soft-to-send-logs-to-remote-server
on 2009-04-13 23:45
On Monday, April 13, 2009 at 21:47:18, Anton Y. wrote: AY> If you want to send logs to other host via syslog-ng AY> don't forget, that some records will be lost in case of AY> network failure (longer than kernel tcp retransmit timeout), AY> and in case of log server reboot. may be http://developers.facebook.com/scribe/ can help. Scribe is a server for aggregating log data streamed in real time from a large number of servers. It is designed to be scalable, extensible without client-side modification, and robust to failure of the network or any specific machine.
on 2009-04-15 17:26
On Mon, Apr 13, 2009 at 12:56:29PM +0930, Kingsley F. wrote: > Hi guys, > > Just wondering what you guys are using for centralized logging of access > logs for multiple servers. I'm thinking about using the syslog patch but I > was wondering if anyone was using anything they think is better? And has > piping of log files been implemented in the newer versions? I personally do not like syslog and log piping. I prefer to writing to local file system and to scp logs at midnight to central host. Some time ago I needed to deliver logs for hourly statistics. I have made script that rotates log every hour, then gzips it (it takes several seconds as a whole ~1G hourly log file is in VFS cache), and sends the log to host. A daily log is done as zcat hourly-logs.gz | 7z > daily-log.7z and then it is copied to central host.
on 2009-04-16 10:42
Awstats comes with a small perl file to merge multiple log files which I have used in the past /usr/bin/logresolvemerge.pl Cheers Dave
on 2009-04-16 10:45
mergelog can do that. 100 solutions to a problem, non of them great :( Kingsley -------------------------------------------------- From: "Glen L." <email@example.com> Sent: Wednesday, April 15, 2009 10:53 PM To: <firstname.lastname@example.org> Subject: Re: Centralized logging for multiple servers
on 2009-04-16 11:16
What about cat *.log | sort -k 4
on 2009-04-16 12:51
>>> I prefer to writing to local file system and to scp logs at midnight >>> to >>> central host. I'm doing the same thing, but in the process of switching from syslog to syslog-ng, due to the fact that one can use tcp with guaranteed delivery. Also, you can log to multiple destinations at once (local and multiple remote for redundancy purposes). Some customers have requested 'near-realtime' stats on their usage, and this is the only way to do it (the remote hosts are expendable and don't actually do anything with the data besides show running stats). Cheers Kon
on 2009-04-16 12:52
On Wed, Apr 15, 2009 at 7:06 AM, Dave C. <email@example.com> wrote: > What about > > cat *.log | sort -k 4 or just cat *whatever.log >today.log I assume the processing script can handle out-of-order requests. but I guess that might be an arrogant assumption. :) I do basically the same thing igor does, but would love to simplify it by just having Host: header counts for bytes (sent/received/total amount of bytes used, basically) and how many http requests. Logging just enough of that to a file and parsing it each night seems kinda amateur...
on 2009-04-17 06:49
Its commercial, but Splunk is amazing at this. I think you can process a few hundred MB/day on the free version. http://splunk.com/ You set up a light-weight forwarder on every node you are interested in, and then it slurps the files up and relays them to a central splunk installation. It will queue internally if the master goes away. Tons of support for sending different files different directions etc. We have it setup in the default Puppet payload so every log on every server is always centralized and searchable.
on 2009-04-17 07:54
Does this scale well? I'm running a web based proxy that generates an absolute ton of log files. Easily 40gb / week / server, with around 20 servers. I'm looking to be able to store and search up to 7 days of logs. Currently, I only move logs from the individual servers onto a central server when I get a complaint, import it into mysql, and search it. The entire process, even for just one server, takes forever.
on 2009-04-17 08:01
I'm by no means a splunk expert, you should ask them, but I think it scales pretty well. You can use multiple masters to receive and load-balance logs, and you can distribute the searching map/reduce style to leverage more cores. Search speed seems to be much more CPU bound than I/O bound, the logs are pretty efficiently packed. *Works for me* with ~ 15-20 EC2 instances and one central logging server. It also keeps logs in tiered buckets, so things from 30 days ago are available, but slower to search on where as yesterday's logs are 'hotter'.
on 2009-04-17 08:16
if just looking for some sort of distribution you could put stuff into memcached, mysql cluster, look at spread toolkit (spread.org i believe) and gearman...
on 2009-04-17 14:41
I've used spread for centralized logging before, it's a horrible clusterF. If you want details as to why, let me know.
on 2009-04-17 19:50
On Fri, Apr 17, 2009 at 3:32 AM, Gabriel R. <firstname.lastname@example.org> wrote: > I've used spread for centralized logging before, it's a horrible > clusterF. If you want details as to why, let me know. no thanks. i have no experience with it and i don't really need something like it right now (hopefully i'd use gearman, as i talk with the main guy behind it a lot now) - it's not a 1:1 concept-wise, but you can probably get gearman to do what you'd want spread to do. i'll take your word for it since i've never bothered. it just looked cool :)
on 2009-04-17 21:42
It sounds cool from a theoretical perspective. I read a book, scalable internet architectures, which champions the cause of spread for all kinds of uses. Getting the thing working, however, is not worth the effort. And trying to pump a lot of raw log files through it in real time, yeah, don't bother giving yourself that headache.
on 2009-04-18 04:16
I was able to use the wayback machine to find the most recent pricing for splunk. It seems that 1gb / day license costs $10k and 10gb / day of log volume is going to set you back $30k. Above that and you have to ask them for pricing. That's really not going to work seeing as how I'm doing more like 100gb / day. Their current website doesn't have any prices at all and just asks that you contact their sales department. I do like what their website talks about how their product works, it seems to be pretty much what I need / am looking for. I might be willing to pay $10k for a license that does 200gb / day, but $30k for one that does 10gb, that's not going to work for me.
on 2009-04-18 05:11
On Fri, Apr 17, 2009 at 5:07 PM, Gabriel R. <email@example.com> wrote: > I was able to use the wayback machine to find the most recent pricing > for splunk. It seems that 1gb / day license costs $10k and 10gb / day > of log volume is going to set you back $30k. Above that and you have > to ask them for pricing. That's really not going to work seeing as how > I'm doing more like 100gb / day. Their current website doesn't have > any prices at all and just asks that you contact their sales > department. The trick is to reduce your log volume. I use a number of parsers that filter and summarize logs before pushing them to the central NMS server and placing them into the splunk queue. Works perfectly and if there is a need to analyze a specific problem we can always go back to the machine with the source logs for further investigation. Cheers Kon
on 2009-04-18 13:34
If I have to do a lot of processing to reduce my log volume, and then go back to the raw logs in case I actually needed the data, is there really a lot of benefit to using splunk in the first place?
on 2009-04-18 15:04
Hi. the first thing you have to deal with is multicast. It's designed so that you can multicast your data to multiple servers without having to specify several destinations and send the traffic to each separately. However, if your switches / routers don't support multicast, it will automatically fall back to broadcasting, even if you only have one server as the destination for the data. My web hosting provider quickly cut off the server that was broadcasting so much data to everyone on the subnet. Broadcasts, since they go to everyone, can slow down all the servers on your network as they all have to decide what to do with the packets they're receiving. you can override this behavior to singlecast if you only have one destination server, but not if you have more than one. secondly is the problem of getting it up and running in the first place, setting up your various variables, compiling and installing, which is not particularly simple. Then you have to decide, how is my data going to get into spread? You're going to need a program that sends the data into spread. Maybe you'll use a perl script, that seems to be popular. One way to do that is to pipe your log file output to a perl script. But what happens when the perl script unexpectedly dies? The application that is piping out to the script fails. You'll have to notice that this has occurred, and kill your hosting software (nginx, apache, squid, whatever) as well as the perl script, and restart the perl script and then your hosting software, in that order. Your perl script can die for all sorts of reasons, not least of which that it lost contact with the spread server for too long, or it's queue of messages to send across the pipe got too long. Yeah, definitely what you want when you've got a backlog of 10,000 requests to send across is to lose them all when your logging program crashes. Ok so another method is to have your hosting platform log to a file as normal, but have your perl script attach to the file with something like tail -f | whatever.pl. That can work, but it suffers the same problems with dying unexpectedly, the only difference being that when the perl script dies, logging to the file continues and the hosting platform doesn't unexpectedly die. the other issue is performance. the receiving spread server has to be able to process all these incoming "messages" as they're called from all your servers, do something with them, and be available to receive more messages, without crashing. Again, this is probably a perl script. And, again, I had no real trouble creating a huge load on the receiving spread server when it was receiving real time log data from just one server. I have 20. If I was lucky, I could have gotten it doing 2 or 3 servers of realtime log data. 20 was not going to happen. The whole thing with spread is that, in theory, it is designed to be robust, but in my experience it is far from it, the whole operation seemed quite fragile. It looked like the amount of effort I was going to have to put in to write programs to make sure that spread was working properly, and to work around potential failure conditions in an elegant way, was obscene. I'm sure spread has a number of good uses, but I could not recommend it for centralized logging. It does look interesting for a program called whack-a-mole, which is designed to help you set up your servers in high availability, but that requires a lot fewer messages flying around than log files would be. SCP'ing or rsyncing a file from your source server to your centralized logging server is a lot more robust. Those transfer programs have a number of protections to make sure the file got there intact, can do compression, whatever. And you don't have to transfer your log files line-by-line with scp or rsync, you can dump huge amounts of data across, and then have your central log server process them at it's leisure. If it can't keep up with peak demand, it can catch up when the site isn't as busy, so you don't have an end-of-the-world scenario if the centralized log server can't keep up with the generation of real time logs.
on 2009-04-18 20:23
On Sat, Apr 18, 2009 at 2:22 AM, Gabriel R. <firstname.lastname@example.org> wrote: > If I have to do a lot of processing to reduce my log volume, and then > go back to the raw logs in case I actually needed the data, is there > really a lot of benefit to using splunk in the first place? Depends on who your splunk users are and how important the extraneous data is. If it is tech support staff then it is still invaluable at being able to give them a mid/high level overview of any outages or problems with customer accounts (since they may not be able to fix the underlying problem anyway). If you have hundreds of accounts and servers offering multiple services, it is a big help. And many times there is no need to log all the data on the system, e.g. with a lot of rsync jobs you really don't need the rsync logging output -- only if the job was successful. Similarly with sync jobs that run every 5 minutes, I don't log success; only failure. The list goes on.. Cheers Kon
on 2009-04-18 23:15
That makes sense. I could definitely use splunk for something like that. Thanks for the ideas.