Re: shards and performance

Mike Klaas Tue, 19 Aug 2008 11:48:09 -0700

On 19-Aug-08, at 10:18 AM, Phillip Farber wrote:

I'm trying to understand how splitting a monolithic index intoshards improves query response time. Please tell me if I'm on theright track here. Were does the increase in performance comefrom? Is it that in-memory arrays are smaller when the index ispartitioned into shards? Or is it due to the likelihood that thesolr process behind each shard is running on its own CPU on a multi-CPU box?

Usually, the performance is obtained by putting shards on separatemachines. However, I have had success partitioning an index on asingle machine so that a single query can be executed by multiplecpus. It also helps to have each index on a different hard disk.

And it must be the case that the overhead of merging results fromseveral shards is still less than the expense of searching amonolithic index. True?

Merging overhead is relatively insignificant. Fetching stored fieldsfrom more docs than necessary is an expense of sharding, however.

Given roughly 10 million documents in several languages inducingperhaps 200K unique terms and averaging about 1 MB/doc how manyshards would you recommend and how much RAM?

I'd never recommend more shards on a single machine than there arecpus. For an index of that size, you will need at least 8GB of ram;16GB would be better.

Is it correct that Distributed Search (shards) is in 1.3 or does 1.2support it?


It is 1.3 only.

If 1.3, is the nightly build the best one to grab bearing in mindthat we would want any protocols around distributed search to be asstable as possible? Or just wait for the 1.3 release?


Go for the nightly build.  The release will look very similar to it.

-Mike

Re: shards and performance

Reply via email to