Hi Erick,

I observed that the update request rate dropped from 20 per sec to 3 per
sec for about 8 minutes. After that there is a huge burst of updates. This
looks quite match the queue up behavior you mentioned. But I don't think
the time out took that long. Is there a configurable setting for the time
out?
Also the bad tlog replica is not reachable at the time, so we did a
DELETEREPLICA command with collections API to remove it from the cloud.

Thanks,
Wei


On Tue, Nov 19, 2019 at 5:52 AM Erick Erickson <erickerick...@gmail.com>
wrote:

> How long are updates blocked and how did the tlog replica on the bad
> hardware go down?
>
> Solr has to wait for an ack back from the tlog follower to be certain that
> the follower has all the documents in case it has to switch to that replica
> to become the leader. If the update to the follower times out, the leader
> will put it into a recovering state.
>
> So I’d expect the collection to queue up indexing until the request to the
> follower on the bad hardware timed out, did you wait at least that long?
>
> Best,
> Erick
>
> > On Nov 18, 2019, at 7:11 PM, Wei <weiwan...@gmail.com> wrote:
> >
> > Hi,
> >
> > I am puzzled by a problem in solr cloud with Tlog replicas and would
> > appreciate your insights.  Our solr cloud has two shards and each shard
> > have 5 tlog replicas. When one of the non-leader replica has hardware
> issue
> > and become unreachable,  updates to the whole cloud stopped.  We are on
> > solr 7.6 and use solrj client to send updates only to leaders.  To my
> > understanding,  with Tlog replica type, the leader only forward update
> > requests to replicas for transaction log update and each replica
> > periodically pulls the segment from leader.  When one replica fails to
> > respond,  why update requests to the cloud are blocked?  Does leader need
> > to wait for response from each replica to inform client that update is
> > successful?
> >
> > Best,
> > Wei
>
>

Reply via email to