I’ve been exploring using ferret for indexing large amounts of
log files. Right now we have a homemade system for searching through
logs that involves specifying a date/time range and then grepping
the relevant files. This can take a long time.
My initial tests (on 2gb of log files) have been promising, I’ve taken
The first is loading each line in each log file as a “document”. The
side to this is that doing a search will get you individual log lines as
results, which is what I want. The downside is that indexing takes a
long time and the index size is very large even when not storing the
contents of the lines. This approach is not viable for indexing all of
The second approach is indexing the log files as documents. This is
relatively fast, 211sec for 2gb of logs, and the index size is a nice
the sample size. The downside is that after figuring out which files
your search terms, you have to crawl through each “hit” document to find
For the sake of full disclosure, at any given time we keep roughly 30
of logs which comes to about 800ish Gb of log files. Each file is
15Mb in size before it gets rotated.
Has anyone else tackled a problem like this and can offer any ideas on
to go about searching those logs? The best idea I can come up with
haven’t implemented yet to get real numbers) is to index a certain
log files by line, like the last 2 days, and then do another set by file
(like the last week). This would have fast results for the more recent
and you would just have to be patient for the slightly older logs.