How long are updates blocked and how did the tlog replica on the bad hardware 
go down?

Solr has to wait for an ack back from the tlog follower to be certain that the 
follower has all the documents in case it has to switch to that replica to 
become the leader. If the update to the follower times out, the leader will put 
it into a recovering state.

So I’d expect the collection to queue up indexing until the request to the 
follower on the bad hardware timed out, did you wait at least that long?

Best,
Erick

> On Nov 18, 2019, at 7:11 PM, Wei <weiwan...@gmail.com> wrote:
> 
> Hi,
> 
> I am puzzled by a problem in solr cloud with Tlog replicas and would
> appreciate your insights.  Our solr cloud has two shards and each shard
> have 5 tlog replicas. When one of the non-leader replica has hardware issue
> and become unreachable,  updates to the whole cloud stopped.  We are on
> solr 7.6 and use solrj client to send updates only to leaders.  To my
> understanding,  with Tlog replica type, the leader only forward update
> requests to replicas for transaction log update and each replica
> periodically pulls the segment from leader.  When one replica fails to
> respond,  why update requests to the cloud are blocked?  Does leader need
> to wait for response from each replica to inform client that update is
> successful?
> 
> Best,
> Wei

Reply via email to