On Thu, Jan 8, 2009 at 10:03 PM, smock <harish.agar...@gmail.com> wrote: > I don't mean to be argumentative - just trying to understand, what is the > difference between distributed search across processors, and distributed > search across boxes (again, assuming that my searches are truly CPU bound)?
Even if your searches are CPU bound, there is CPU and IO overhead in distributed search. time_for_whole_index vs time_for_half_index + distributed_search_overhead Distributed search is optimized for the case when the index is so big that one *must* distribute it across multiple shards. It works in multiple phases, first only collecting and merging the document ids, and then requesting stored fields for the top documents in another phase. It's also optimized for total throughput of the whole system. If one was optimizing for response time with smaller documents and single requests, then merging results in a single shot would yield better results. If you load test a distributed vs non-distributed system on a single box, the distributed will normally lose. This is because to find the top 10 documents in general, one must retrieve the top 10 documents from each shard - more work is done. Single request latency *can* be shorter under the right circumstances, but under load it will always lose since more work is done in aggregate. -Yonik