Looks like LeaderInitiatedRecovery or LIR. When a leader receives a document (update) but fails to successfully forward it to a replica, it marks that replica as down and asks the replica to recover (hence the name, Leader Initiated Recovery). It could be due to multiple reasons e.g. network issue/GC. The replica generally comes back up and syncs with the leader transparently. As an end-user, you don't have to really worry much about this but if you want to dig deeper, here are a few questions that might help us in suggesting what to do/look at. * Do you see this happen often? How often? * Are there any known network issues? * Do you have any idea about the GC on those replicas?
On Mon, Apr 27, 2015 at 1:25 PM, Amit L <amitlal...@gmail.com> wrote: > Hi, > > A few days ago I deployed a solr 4.9.0 cluster, which consists of 2 > collections. Each collection has 1 shard with 3 replicates on 3 different > machines. > > On the first day I noticed this error appear on the leader. Full Log - > http://pastebin.com/wcPMZb0s > > 4/23/2015, 2:34:37 PM SEVERE SolrCmdDistributor > org.apache.solr.client.solrj.SolrServerException: IOException occured when > talking to server at: > http://production-solrcloud-004:8080/solr/bookings_shard1_replica2 > > 4/23/2015, 2:34:37 PM WARNING DistributedUpdateProcessor > Error sending update > > 4/23/2015, 2:34:37 PM WARNING ZkController > Leader is publishing core=bookings_shard1_replica2 state=down on behalf of > un-reachable replica > http://production-solrcloud-004:8080/solr/bookings_shard1_replica2/; > forcePublishState? false > > > The other 2 replicas had 0 errors. > > I thought it may be a one off but the same error occured on day 2 which has > got me slighlty concerned. During these periods I didn't notice any issues > with the cluster and everything looks healthy in the cloud summary. All of > the instances are hosted on AWS. > > Any idea what may be causing this issue and what I can do to mitigate? > > Thanks > Amit > -- Anshum Gupta