Re: SolrCloud Replication Issue

2015-04-27 Thread Erick Erickson
Amit: The fact that "all instances are using no more than 30%" isn't really indicative of whether or not GC pauses are a problem. If you have a large heap allocated to Java, then the to-be-collected objects will build up and _eventually_ you'll have a stop-the-world GC pause even though each t

Re: SolrCloud Replication Issue

2015-04-27 Thread Amit L
Appreciate the response, to answer your questions. * Do you see this happen often? How often? It has happened twice in five days. The first two days after deployment. * Are there any known network issues? There are no obvious network issues but as these instances reside in AWS i cannot rule it ou

Re: SolrCloud Replication Issue

2015-04-27 Thread Anshum Gupta
Looks like LeaderInitiatedRecovery or LIR. When a leader receives a document (update) but fails to successfully forward it to a replica, it marks that replica as down and asks the replica to recover (hence the name, Leader Initiated Recovery). It could be due to multiple reasons e.g. network issue/