On Thu, May 14, 2009 at 9:07 AM, Marvin Humphrey <mar...@rectangular.com> wrote: > Richard Feynman: > > "...if you're doing an experiment, you should report everything that you > think might make it invalid - not only what you think is right about it: > other causes that could possibly explain your results; and things you > thought of that you've eliminated by some other experiment, and how they > worked - to make sure the other fellow can tell they have been eliminated."
Excellent quote! > So, should Lucene use the non-compound file format by default because some > idiot's sloppy benchmarks might run a smidge faster, even though that will > cause many users to run out of file descriptors? No, I don't think we should change that default. Nor (for example) can we switch to SweetSpotSimilarity by default, even though it seems to improve relevance, because it requires app-dependent configuration. Nor should we set IndexWriter's RAM buffer to 1 GB. Etc. But when there is a choice that has near zero downside and improves performance (like my example), we should make the switch. Making IndexReader.open return a readOnly reader is another example (... which we plan to do in 3.0). Every time Lucene or Solr has a default built-in setting, we should think carefully about how to set it. > Anyone doing comparative benchmarking who doesn't submit their code to the > support list for the software under review is either a dolt or a propagandist. > > Good benchmarking is extremely difficult, like all experimental science. If > there isn't ample evidence that the benchmarker appreciates that, their tests > aren't worth a second thought. If you don't avail yourself of the help of > experts when assembling your experiment, you are unserious. Agreed. Mike