Hi Yonik,
I see, I didn't realize that there was a 2nd phase to retrieve stored
values.  Sphinx also queries the top n number of documents and combines the
results - unless the algorithm is very different, I wouldn't expect that
this adds a lot of overhead as sphinx has a very definite performance boost
when distributing. 

It seems to me that all of the points you're making below apply just as well
to distributing across multiple boxes - where does the issue of doing the
distribution on a single box come into play?  Anecdotally, everything you're
saying completely meshes with my load testing of Solr (single full index is
performing better than the distributed index).  I may have to stick with
Sphinx, though, if I can't boost the performance of Solr on a single box.

-Harish



yonik wrote:
> 
> On Thu, Jan 8, 2009 at 10:03 PM, smock <harish.agar...@gmail.com> wrote:
>> I don't mean to be argumentative - just trying to understand, what is the
>> difference between distributed search across processors, and distributed
>> search across boxes (again, assuming that my searches are truly CPU
>> bound)?
> 
> Even if your searches are CPU bound, there is CPU and IO overhead in
> distributed search.
> 
> time_for_whole_index
>   vs
> time_for_half_index + distributed_search_overhead
> 
> Distributed search is optimized for the case when the index is so big
> that one *must* distribute it across multiple shards.  It works in
> multiple phases, first only collecting and merging the document ids,
> and then requesting stored fields for the top documents in another
> phase.  It's also optimized for total throughput of the whole system.
> 
> If one was optimizing for response time with smaller documents and
> single requests, then merging results in a single shot would yield
> better results.
> 
> If you load test a distributed vs non-distributed system on a single
> box, the distributed will normally lose.  This is because to find the
> top 10 documents in general, one must retrieve the top 10 documents
> from each shard - more work is done.  Single request latency *can* be
> shorter under the right circumstances, but under load it will always
> lose since more work is done in aggregate.
> 
> -Yonik
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Solr-on-a-multiprocessor-machine-tp21360747p21365956.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to