Re: Micro-Sharding

Shawn Heisey Sat, 03 Dec 2011 18:36:52 -0800

On 12/3/2011 2:25 PM, Ted Dunning wrote:

Things have changed since I last did this sort of thing seriously. Myguess is that this is a relatively small amount of memory to devote tosearch. It used to be that the only way to do this effectively withLucene based systems was to keep the heap relatively small like youhave here and put the index into a tmpfs mount. I think better waysare now available which would keep the index in memory in the searchengine itself for better speed. One customer that we have now hassearch engines with 128GB of memory. He fills much of that with liveindex sharded about 10-fold. In-memory indexes can run enough fasterto be more cost effective than disk based indexes because you need somany fewer machines to run the searches in the required response time.

My servers (two for each chain, a total of four) are at their maximummemory size of 64GB. They have two quad-core Xeon processors (E54xxseries) in them that are not hyperthreaded. With 8GB given to Solr,there is approximately 55GB available for the disk cache, which issmaller than the size of the three large indexes (20GB each) on eachserver, and the indexes are constantly getting bigger. I don't thinkin-memory indexes is an option for me. I do not expect any budget foradditional servers for quite some time, either.

I have 16 processor cores available for each index chain (two servers).If I set aside one for the distributed search itself and one for theincremental (that small 3.5 to 7 day shard), it sounds like my idealnumShards from Solr's perspective is 14. I have some fear that mydatabase server will fall over under the load of 14 DB connectionsduring a full index rebuild, though. Do you have any other thoughts for me?


Thanks,
Shawn

Re: Micro-Sharding

Reply via email to