I'm trying to understand how splitting a monolithic index into shards improves query response time. Please tell me if I'm on the right track here. Were does the increase in performance come from? Is it that in-memory arrays are smaller when the index is partitioned into shards? Or is it due to the likelihood that the solr process behind each shard is running on its own CPU on a multi-CPU box?

And it must be the case that the overhead of merging results from several shards is still less than the expense of searching a monolithic index. True?

Given roughly 10 million documents in several languages inducing perhaps 200K unique terms and averaging about 1 MB/doc how many shards would you recommend and how much RAM?

Is it correct that Distributed Search (shards) is in 1.3 or does 1.2 support it?

If 1.3, is the nightly build the best one to grab bearing in mind that we would want any protocols around distributed search to be as stable as possible? Or just wait for the 1.3 release?



Thanks very much,

Phil

------------------------------------------
Phillip Farber - http://www.umdl.umich.edu





Reply via email to