Re: SolrCloud shard leader elections - Altering zookeeper sequence numbers

Daniel Collins Tue, 13 Jan 2015 02:48:48 -0800

Is it important where your leader is?  If you just want to minimize
leadership changes during rolling re-start, then you could restart in the
opposite order (S3, S2, S1).  That would give only 1 transition, but the
end result would be a leader on S2 instead of S1 (not sure if that
important to you or not).  I know its not a "fix", but it might be a
workaround until the whole leadership moving is done?


On 12 January 2015 at 18:17, Erick Erickson <erickerick...@gmail.com> wrote:

> Just skimming, but the problem here that I ran into was with the
> listeners. Each _Solr_ instance out there is listening to one of the
> ephemeral nodes (the "one in front"). So deleting a node does _not_
> change which ephemeral node the associated Solr instance is listening
> to.
>
> So, for instance, when you delete S2..n-000001 and re-add it, S2 is
> still looking at S1....n-000000 and will continue looking at
> S1...n-000000 until S1....n-000000 is deleted.
>
> Deleting S2..n-000001 will wake up S3 though, which should now be
> looking at S1....n-0000000. Now you have two Solr listeners looking at
> the same ephemeral node. The key is that deleting S2...n-000001 does
> _not_ wake up S2, just any solr instance that has a watch on the
> associated ephemeral node.
>
> The code you want is in LeaderElector.checkIfIamLeader to understand
> how it all works. Be aware that the sortSeqs call sorts the nodes by
> 1> sequence number
> 2> string comparison.
>
> Which has the unfortunate characteristic of a secondary sort by
> session ID. So two nodes with the same sequence number can sort before
> or after each other depending on which one gets a session higher/lower
> than the other.
>
> This is quite tricky to get right, I once created a patch for 4.10.3
> by applying things in this order (some minor tweaks required). All
> SOLR-
> 6115
> 6512
> 6577
> 6513
> 6517
> 6670
> 6691
>
> Good luck!
> Erick
>
>
>
>
> On Mon, Jan 12, 2015 at 8:54 AM, Zisis Tachtsidis <zist...@runbox.com>
> wrote:
> > SolrCloud uses ZooKeeper sequence flags to keep track of the order in
> which
> > nodes register themselves as leader candidates. The node with the lowest
> > sequence number wins as leader of the shard.
> >
> > What I'm trying to do is to keep the leader re-assignments to the minimum
> > during a rolling restart. In this direction I change the zk sequence
> numbers
> > on the SolrCloud nodes when all nodes of the cluster are up and active.
> I'm
> > using Solr 4.10.0 and I'm aware of SOLR-6491 which has a similar purpose
> but
> > I'm trying to do it from "outside", using the existing APIs without
> editing
> > Solr source code.
> >
> > == TYPICAL SCENARIO ==
> > Suppose we have 3 Solr instances S1,S2,S3. They are started in the same
> > order and the zk sequences assigned have as follows
> > S1:-n_0000000000 (LEADER)
> > S2:-n_0000000001
> > S3:-n_0000000002
> >
> > In a rolling restart we'll get S2 as leader (after S1 shutdown), then S3
> > (after S2 shutdown) and finally S1(after S3 shutdown), 3 changes in
> total.
> >
> > == MY ATTEMPT ==
> > By using SolrZkClient and the Zookeeper multi API  I found a way to get
> rid
> > of the old zknodes that participate in a shard's leader election and
> write
> > new ones where we can assign the sequence number of our liking.
> >
> > S1:-n_0000000000 (no code running here)
> > S2:-n_0000000004 (code deleting zknode -n_0000000001 and creating
> > -n_0000000004)
> > S3:-n_0000000003 (code deleting zknode -n_0000000002 and creating
> > -n_0000000003)
> >
> > In a rolling restart I'd expect to have S3 as leader (after S1
> shutdown), no
> > change (after S2 shutdown) and finally S1(after S3 shutdown), that is 2
> > changes. This will be constant no matter how many servers are added in
> > SolrCloud while in the first scenarion the # of re-assignments equals
> the #
> > of Solr servers.
> >
> > The problem occurs when S1 (LEADER) is shut down. The elections that take
> > place still set S2 as leader, It's like ignoring the new sequence
> numbers.
> > When I go to /solr/#/~cloud?view=tree the new sequence numbers are listed
> > under "/collections" based on which S3 should have become the leader.
> > Do you have any idea why the new state is not acknowledged during the
> > elections? Is something cached? Or to put it bluntly do I have any chance
> > down this path? If not what are my options? Is it possible to apply all
> > patches under SOLR-6491 in isolation and continue from there?
> >
> > Thank you.
> >
> > Extra info which might help follows
> > 1. Some logging related to leader elections after S1 has been shut down
> >     S2 - org.apache.solr.cloud.SyncStrategy Leader's attempt to sync with
> > shard failed, moving to the next candidate
> >     S2 - org.apache.solr.cloud.ShardLeaderElectionContext We failed sync,
> > but we have no versions - we can't sync in that
> >            case - we were active before, so become leader anyway
> >
> >     S3 - org.apache.solr.cloud.LeaderElector Our node is no longer in
> line
> > to be leader
> >
> > 2. And some sample code on how I perform the ZK re-sequencing
> >    // Read current zk nodes for a specific collection
> >
> >
> solrServer.getZkStateReader().getZkClient().getSolrZooKeeper().getChildren("/collections/core/leader_elect/shard1
> >       /election", true)
> >    // node deletion
> >       Op.delete(path, -1)
> >    // node creation
> >       Op.create(createPath, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE,
> > CreateMode.EPHEMERAL_SEQUENTIAL);
> >    // Perform operations
> >
> >
> solrServer.getZkStateReader().getZkClient().getSolrZooKeeper().multi(opsList);
> >       solrServer.getZkStateReader().updateClusterState(true);
> >
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-shard-leader-elections-Altering-zookeeper-sequence-numbers-tp4178973.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: SolrCloud shard leader elections - Altering zookeeper sequence numbers

Reply via email to