Looks like LeaderInitiatedRecovery or LIR. When a leader receives a
document (update) but fails to successfully forward it to a replica, it
marks that replica as down and asks the replica to recover (hence the name,
Leader Initiated Recovery). It could be due to multiple reasons e.g.
network issue/GC. The replica generally comes back up and syncs with the
leader transparently. As an end-user, you don't have to really worry much
about this but if you want to dig deeper, here are a few questions that
might help us in suggesting what to do/look at.
* Do you see this happen often? How often?
* Are there any known network issues?
* Do you have any idea about the GC on those replicas?


On Mon, Apr 27, 2015 at 1:25 PM, Amit L <amitlal...@gmail.com> wrote:

> Hi,
>
> A few days ago I deployed a solr 4.9.0 cluster, which consists of 2
> collections. Each collection has 1 shard with 3 replicates on 3 different
> machines.
>
> On the first day I noticed this error appear on the leader. Full Log -
> http://pastebin.com/wcPMZb0s
>
> 4/23/2015, 2:34:37 PM SEVERE SolrCmdDistributor
> org.apache.solr.client.solrj.SolrServerException: IOException occured when
> talking to server at:
> http://production-solrcloud-004:8080/solr/bookings_shard1_replica2
>
> 4/23/2015, 2:34:37 PM WARNING DistributedUpdateProcessor
> Error sending update
>
> 4/23/2015, 2:34:37 PM WARNING ZkController
> Leader is publishing core=bookings_shard1_replica2 state=down on behalf of
> un-reachable replica
> http://production-solrcloud-004:8080/solr/bookings_shard1_replica2/;
> forcePublishState? false
>
>
> The other 2 replicas had 0 errors.
>
> I thought it may be a one off but the same error occured on day 2 which has
> got me slighlty concerned. During these periods I didn't notice any issues
> with the cluster and everything looks healthy in the cloud summary. All of
> the instances are hosted on AWS.
>
> Any idea what may be causing this issue and what I can do to mitigate?
>
> Thanks
> Amit
>



-- 
Anshum Gupta

Reply via email to