On 1/7/2011 2:57 AM, supersoft wrote:
have deployed a 5-sharded infrastructure where: shard1 has 3124422 docs
shard2 has 920414 docs shard3 has 602772 docs shard4 has 2083492 docs shard5
has 11915639 docs Indexes total size: 100GB

The OS is Linux x86_64 (Fedora release 8) with vMem equal to 7872420 and I
run the server using Jetty (from Solr example download) with: java -Xmx3024M
-Dsolr.solr.home=multicore -jar start.jar

The response time for a query is around 2-3 seconds. Nevertheless, if I
execute several queries at the same time the performance goes down
inmediately: 1 simultaneous query: 2516ms 2 simultaneous queries: 4250,4469
ms 3 simultaneous queries: 5781, 6219, 6219 ms 4 simultaneous queries: 6484,
7203, 7719, 7781 ms...

I see from your other messages that these indexes all live on the same machine. You're almost certainly I/O bound, because you don't have enough memory for the OS to cache your index files. With 100GB of total index size, you'll get best results with between 64GB and 128GB of total RAM. Alternatively, you could use SSD to store the indexes instead of spinning hard drives, or put each shard on its own physical machine with RAM appropriately sized for the index. For shard5 on its own machine, at 64GB index size, you might be able to get away with 32GB, but ideally you'd want 48-64GB.

Can you do anything to reduce the index size? Perhaps you are storing fields that you don't need to be returned in the search results. Ideally, you should only include enough information to fully populate a search results grid, and retrieve detail information for an individual document from the original data source instead of Solr.

Thanks,
Shawn

Reply via email to