Appreciate the response, to answer your questions. * Do you see this happen often? How often? It has happened twice in five days. The first two days after deployment.
* Are there any known network issues? There are no obvious network issues but as these instances reside in AWS i cannot rule it out network blips. * Do you have any idea about the GC on those replicas? I have been monitoring the memory usage and all instances are using no more than 30% of its JVM memory allocation. On 27 April 2015 at 21:36, Anshum Gupta <ans...@anshumgupta.net> wrote: > Looks like LeaderInitiatedRecovery or LIR. When a leader receives a > document (update) but fails to successfully forward it to a replica, it > marks that replica as down and asks the replica to recover (hence the name, > Leader Initiated Recovery). It could be due to multiple reasons e.g. > network issue/GC. The replica generally comes back up and syncs with the > leader transparently. As an end-user, you don't have to really worry much > about this but if you want to dig deeper, here are a few questions that > might help us in suggesting what to do/look at. > * Do you see this happen often? How often? > * Are there any known network issues? > * Do you have any idea about the GC on those replicas? > > > On Mon, Apr 27, 2015 at 1:25 PM, Amit L <amitlal...@gmail.com> wrote: > > > Hi, > > > > A few days ago I deployed a solr 4.9.0 cluster, which consists of 2 > > collections. Each collection has 1 shard with 3 replicates on 3 different > > machines. > > > > On the first day I noticed this error appear on the leader. Full Log - > > http://pastebin.com/wcPMZb0s > > > > 4/23/2015, 2:34:37 PM SEVERE SolrCmdDistributor > > org.apache.solr.client.solrj.SolrServerException: IOException occured > when > > talking to server at: > > http://production-solrcloud-004:8080/solr/bookings_shard1_replica2 > > > > 4/23/2015, 2:34:37 PM WARNING DistributedUpdateProcessor > > Error sending update > > > > 4/23/2015, 2:34:37 PM WARNING ZkController > > Leader is publishing core=bookings_shard1_replica2 state=down on behalf > of > > un-reachable replica > > http://production-solrcloud-004:8080/solr/bookings_shard1_replica2/; > > forcePublishState? false > > > > > > The other 2 replicas had 0 errors. > > > > I thought it may be a one off but the same error occured on day 2 which > has > > got me slighlty concerned. During these periods I didn't notice any > issues > > with the cluster and everything looks healthy in the cloud summary. All > of > > the instances are hosted on AWS. > > > > Any idea what may be causing this issue and what I can do to mitigate? > > > > Thanks > > Amit > > > > > > -- > Anshum Gupta >