> Extrapolating what Jack was saying on his reply ... with 100 shards and
> 4 replicas, you have 400 cores that are each about 2.8GB.  That results in a 
> total index size of just over a terabyte, with 140GB of index data on each of 
> the eight servers.

> Assuming you have only one Solr instance per server, an ideal setup would 
> have enough RAM for that 140GB of index plus the 16GB max heap, so 156GB of 
> RAM.  Because the ideal setup is rarely a strict requirement unless the query 
> load is high, if you have 128 GB of RAM per server, then I would not be 
> worried about performance.  If you have less than that, then I would be 
> worried.

Have less than this :/ :( - with not much likelihood to upgrade anytime soon - 
just out of curiosity, if the performance is proportional to the RAM, why am I 
seeing such good query times for the initial shard queries? (they are all under 
100ms). 

> The behavior with the same shard listed multiple times is a little strange.  
> That behavior could indicate problems with garbage collection pauses -- as 
> Solr is building the memory structures necessary to compose the final 
> response, it might fill up one of the heap generations to its current size 
> limit and each subsequent allocation might require a significant garbage 
> collection, stopping the world while it happens, but not freeing up any 
> significant amount of memory in that particular heap generation.

> Have you tuned your garbage collection?  If not, that is a likely suspect.  
> If you run with the latest Oracle Java, you can use my settings and probably 
> see good GC performance:

> https://wiki.apache.org/solr/ShawnHeisey

> Further down on the page is a good set of CMS parameters for earlier Java 
> versions, if you can't run the latest.

We will look into this thank you, if this can decrease the last few shards 
qtime, then we should still see reasonable speeds (if not the fastest if it has 
to load from disk, but hopefully faster than the 50 seconds we have been seeing)

The weird thing is, if I query each shard inidividually with distrib=false the 
query time never goes over 100ms (I concurrently hammer 1 shard like I did with 
my test in the previous email but not using shard= and I never get a query over 
100ms) ... which leads me to believe there is some bottleneck with the 
distrib=/shard= parameters.

Reply via email to