Hi List, I've took some time and made some tests on the performance of java-lucene, hyperestraier and ferret as Dave encourages the community of ferret to do so. Quite intersting numbers. Ferret indeed deserves to be called a high-performance port!! It's MyFirstBenchmark ( http://ferret.davebalmain.com/trac/wiki/MyFirstBenchmark ) so please don't be too cruel on critizing the method. It's just a hack and it's flawed - as every other benchmark. But it provides some numbers and regardlass how flawed it is, one thing remains true: All of these search engines are fast enough for most of us... Regards Jan
on 2006-05-13 01:38
on 2006-05-16 03:41
On May 12, 2006, at 2:38 PM, Jan P. wrote: > Hi List, > > I've took some time and made some tests on the performance of > java-lucene, hyperestraier and ferret as Dave encourages the community > of ferret to do so. Hello, Jan... On the benchmarking page you make this request. "If you are an expert in one of these search-engines than provide some information about the best optimizations." As the author of another Lucene port (KinoSearch, Perl/C), I know a fair amount about Lucene. Better, I put together some benchmarks comparing Lucene, KinoSearch and Plucene, a little while ago <http:// www.rectangular.com/kinosearch/benchmarks.html>, and I solicited the help of the Lucene developers list to help tune the Lucene benchmarking app. By the end it performed around twice as well as my initial version. In order to max out Lucene's indexing speed... * Don't use the compound file format: indexWriter.setUseCompoundFile(false); * Set maxBufferedDocs to at least 100, and if you have the RAM, 1000: indexWriter.setMaxBufferedDocs(1000); * Give the JVM a generous heap and run it under -server: java -Xmx500M -server MyIndexer * Make sure that JVM startup time is not factored into the results unless you intend it to be. All this in addition to good stuff like warming up OS caches with dry runs prior to test runs, ensuring that the machine is otherwise idle, making sure that the analyzers are exactly equivalent (the fact that the search results differ is a red flag -- I'd use WhiteSpaceAnalyzer instead of whatever you're using), and other such steps to isolate the variables you intend to measure. Then, perform multiple iterations. > It's MyFirstBenchmark ( > http://ferret.davebalmain.com/trac/wiki/MyFirstBenchmark ) so please > don't be too cruel on critizing the method. It's very difficult to run a good scientific experiment of any kind. In fact my current results are flawed -- I left out a call to optimize () in the Lucene benchmark, so Lucene performs not quite so well as the numbers on my page would indicate. But I'd rather err on that side than on the giving the engine I'm attached to a leg up. > one thing remains true: All of these search > engines are fast enough for most of us... Yes. Things are different than they were just a couple years ago. Marvin H. Rectangular Research http://www.rectangular.com/
on 2006-05-16 11:04
Hi, Marvin, thank you very much. I will take these advices into account when I'm doing other tests. As a first step I'll add a link to your post to the ferret wiki to let people know... Regards Jan P.