Re: Solr vs Sphinx

Michael McCandless Thu, 14 May 2009 11:01:52 -0700

On Thu, May 14, 2009 at 9:07 AM, Marvin Humphrey <mar...@rectangular.com> wrote:
> Richard Feynman:
>
>    "...if you're doing an experiment, you should report everything that you
>    think might make it invalid - not only what you think is right about it:
>    other causes that could possibly explain your results; and things you
>    thought of that you've eliminated by some other experiment, and how they
>    worked - to make sure the other fellow can tell they have been eliminated."


Excellent quote!

> So, should Lucene use the non-compound file format by default because some
> idiot's sloppy benchmarks might run a smidge faster, even though that will
> cause many users to run out of file descriptors?

No, I don't think we should change that default.

Nor (for example) can we switch to SweetSpotSimilarity by default,
even though it seems to improve relevance, because it requires
app-dependent configuration.

Nor should we set IndexWriter's RAM buffer to 1 GB.  Etc.

But when there is a choice that has near zero downside and improves
performance (like my example), we should make the switch.

Making IndexReader.open return a readOnly reader is another example
(... which we plan to do in 3.0).

Every time Lucene or Solr has a default built-in setting, we should
think carefully about how to set it.

> Anyone doing comparative benchmarking who doesn't submit their code to the
> support list for the software under review is either a dolt or a propagandist.
>
> Good benchmarking is extremely difficult, like all experimental science.  If
> there isn't ample evidence that the benchmarker appreciates that, their tests
> aren't worth a second thought.  If you don't avail yourself of the help of
> experts when assembling your experiment, you are unserious.

Agreed.

Mike

Re: Solr vs Sphinx

Reply via email to