GW,
Did you mean a separate transaction log on Solr or on Zookeeper?

-suresh

On Tue, Jun 6, 2017 at 5:23 AM, GW <thegeofo...@gmail.com> wrote:

> I've heard of systems tanking like this on Windows during OS updates.
> Because of this, I run all my updates in attendance even though I'm Linux.
> My Nodes run as VM's, I shut down Solr gracefully, snap shot a backup of
> the VM, update and run. If things go screwy I can always roll back. To me
> it sounds like a lack of resources or a kink in your networking, assuming
> your set up is correct. Watch for home made network cables. I've seen soft
> crimp connectors put on solid wire which can wreck a switch port forever.
> Do you have a separate transaction log device on each Zookeeper? I made
> this mistake in the beginning and had similar problems under load.
>
>
> GW
>
> On 5 June 2017 at 22:32, Erick Erickson <erickerick...@gmail.com> wrote:
>
> > bq: This means that technically the replica nodes should not fall behind
> > and do
> > not have to go into recovery mode
> >
> > Well, true if nothing weird happens. By "weird" I mean anything that
> > interferes with the leader getting anything other than a success code
> > back from a follower it sends  document to.
> >
> > bq: Is this the only scenario in which a node can go into recovery
> status?
> >
> > No, there are others. One for-instance: Leader sends a doc to the
> > follower and the request times out (huge  GC pauses, the doc takes too
> > long to index for whatever reason etc). The leader then sends a
> > message to the follower to go directly into the recovery state since
> > the leader has no way of knowing whether the follower successfully
> > wrote the document to it's transaction log. You'll see messages about
> > "leader initiated recovery" in the follower's solr log in this case.
> >
> > two bits of pedantry:
> >
> > bq:  Down by the other replicas
> >
> > Almost. we're talking indexing here and IIUC only the leader can send
> > another node into recovery as all updates go through the leader.
> >
> > If I'm going to be nit-picky, Zookeeper can _also_ cause a node to be
> > marked as down if it's periodic ping of the node fails to return.
> > Actually I think this is done through another Solr node that ZK
> > notifies....
> >
> > bq: It goes into a recovery mode and tries to recover all the
> > documents from the leader of shard1.
> >
> > Also nit-picky. But if the follower isn't "too far" behind it can be
> > brought back into sync from via "peer sync" where it gets the missed
> > docs sent to it from the tlog of a healthy replica. "Too far" is 100
> > docs by default, but can be set in solrconfig.xml if necessary. If
> > that limit is exceeded, then indeed the entire index is copied from
> > the leader.
> >
> > Best,
> > Erick
> >
> >
> >
> > On Mon, Jun 5, 2017 at 5:18 PM, suresh pendap <sureshfors...@gmail.com>
> > wrote:
> > > Hi,
> > >
> > > Why and in what scenarios do Solr nodes go into recovery status?
> > >
> > > Given that Solr is a CP system it means that the writes for a Document
> > > index are acknowledged only after they are propagated and acknowledged
> by
> > > all the replicas of the Shard.
> > >
> > > This means that technically the replica nodes should not fall behind
> and
> > do
> > > not have to go into recovery mode.
> > >
> > > Is my above understanding correct?
> > >
> > > Can a below scenario happen?
> > >
> > > 1. Assume that we have 3 replicas for Shard shard1 with the names
> > > shard1_replica1, shard1_replica2 and shard1_replica3.
> > >
> > > 2. Due to some reason, network issue or something else, the
> > shard1_replica2
> > > is not reachable by the other replicas and it is marked as Down by the
> > > other replicas (shard1_replica1 and shard1_replica3 in this case)
> > >
> > > 3. The network issue is restored and the shard1_replica2 is reachable
> > > again. It goes into a recovery mode and tries to recover all the
> > documents
> > > from the leader of shard1.
> > >
> > > Is this the only scenario in which a node can go into recovery status?
> > >
> > > In other words, does the node has to go into a Down status before
> getting
> > > back into a recovery status?
> > >
> > >
> > > Regards
> >
>

Reply via email to