On 12/3/2011 2:25 PM, Ted Dunning wrote:
Things have changed since I last did this sort of thing seriously. My guess is that this is a relatively small amount of memory to devote to search. It used to be that the only way to do this effectively with Lucene based systems was to keep the heap relatively small like you have here and put the index into a tmpfs mount. I think better ways are now available which would keep the index in memory in the search engine itself for better speed. One customer that we have now has search engines with 128GB of memory. He fills much of that with live index sharded about 10-fold. In-memory indexes can run enough faster to be more cost effective than disk based indexes because you need so many fewer machines to run the searches in the required response time.

My servers (two for each chain, a total of four) are at their maximum memory size of 64GB. They have two quad-core Xeon processors (E54xx series) in them that are not hyperthreaded. With 8GB given to Solr, there is approximately 55GB available for the disk cache, which is smaller than the size of the three large indexes (20GB each) on each server, and the indexes are constantly getting bigger. I don't think in-memory indexes is an option for me. I do not expect any budget for additional servers for quite some time, either.

I have 16 processor cores available for each index chain (two servers). If I set aside one for the distributed search itself and one for the incremental (that small 3.5 to 7 day shard), it sounds like my ideal numShards from Solr's perspective is 14. I have some fear that my database server will fall over under the load of 14 DB connections during a full index rebuild, though. Do you have any other thoughts for me?

Thanks,
Shawn

Reply via email to