How long are updates blocked and how did the tlog replica on the bad hardware go down?
Solr has to wait for an ack back from the tlog follower to be certain that the follower has all the documents in case it has to switch to that replica to become the leader. If the update to the follower times out, the leader will put it into a recovering state. So I’d expect the collection to queue up indexing until the request to the follower on the bad hardware timed out, did you wait at least that long? Best, Erick > On Nov 18, 2019, at 7:11 PM, Wei <weiwan...@gmail.com> wrote: > > Hi, > > I am puzzled by a problem in solr cloud with Tlog replicas and would > appreciate your insights. Our solr cloud has two shards and each shard > have 5 tlog replicas. When one of the non-leader replica has hardware issue > and become unreachable, updates to the whole cloud stopped. We are on > solr 7.6 and use solrj client to send updates only to leaders. To my > understanding, with Tlog replica type, the leader only forward update > requests to replicas for transaction log update and each replica > periodically pulls the segment from leader. When one replica fails to > respond, why update requests to the cloud are blocked? Does leader need > to wait for response from each replica to inform client that update is > successful? > > Best, > Wei