Re: SolrCloud Replication Issue

2015-04-27 Thread Erick Erickson
Amit: The fact that "all instances are using no more than 30%" isn't really indicative of whether or not GC pauses are a problem. If you have a large heap allocated to Java, then the to-be-collected objects will build up and _eventually_ you'll have a stop-the-world GC pause even though each t

Re: SolrCloud Replication Issue

2015-04-27 Thread Amit L
Appreciate the response, to answer your questions. * Do you see this happen often? How often? It has happened twice in five days. The first two days after deployment. * Are there any known network issues? There are no obvious network issues but as these instances reside in AWS i cannot rule it ou

Re: SolrCloud Replication Issue

2015-04-27 Thread Anshum Gupta
Looks like LeaderInitiatedRecovery or LIR. When a leader receives a document (update) but fails to successfully forward it to a replica, it marks that replica as down and asks the replica to recover (hence the name, Leader Initiated Recovery). It could be due to multiple reasons e.g. network issue/

SolrCloud Replication Issue

2015-04-27 Thread Amit L
Hi, A few days ago I deployed a solr 4.9.0 cluster, which consists of 2 collections. Each collection has 1 shard with 3 replicates on 3 different machines. On the first day I noticed this error appear on the leader. Full Log - http://pastebin.com/wcPMZb0s 4/23/2015, 2:34:37 PM SEVERE SolrCmdDist