Is it important where your leader is? If you just want to minimize leadership changes during rolling re-start, then you could restart in the opposite order (S3, S2, S1). That would give only 1 transition, but the end result would be a leader on S2 instead of S1 (not sure if that important to you or not). I know its not a "fix", but it might be a workaround until the whole leadership moving is done?
On 12 January 2015 at 18:17, Erick Erickson <erickerick...@gmail.com> wrote: > Just skimming, but the problem here that I ran into was with the > listeners. Each _Solr_ instance out there is listening to one of the > ephemeral nodes (the "one in front"). So deleting a node does _not_ > change which ephemeral node the associated Solr instance is listening > to. > > So, for instance, when you delete S2..n-000001 and re-add it, S2 is > still looking at S1....n-000000 and will continue looking at > S1...n-000000 until S1....n-000000 is deleted. > > Deleting S2..n-000001 will wake up S3 though, which should now be > looking at S1....n-0000000. Now you have two Solr listeners looking at > the same ephemeral node. The key is that deleting S2...n-000001 does > _not_ wake up S2, just any solr instance that has a watch on the > associated ephemeral node. > > The code you want is in LeaderElector.checkIfIamLeader to understand > how it all works. Be aware that the sortSeqs call sorts the nodes by > 1> sequence number > 2> string comparison. > > Which has the unfortunate characteristic of a secondary sort by > session ID. So two nodes with the same sequence number can sort before > or after each other depending on which one gets a session higher/lower > than the other. > > This is quite tricky to get right, I once created a patch for 4.10.3 > by applying things in this order (some minor tweaks required). All > SOLR- > 6115 > 6512 > 6577 > 6513 > 6517 > 6670 > 6691 > > Good luck! > Erick > > > > > On Mon, Jan 12, 2015 at 8:54 AM, Zisis Tachtsidis <zist...@runbox.com> > wrote: > > SolrCloud uses ZooKeeper sequence flags to keep track of the order in > which > > nodes register themselves as leader candidates. The node with the lowest > > sequence number wins as leader of the shard. > > > > What I'm trying to do is to keep the leader re-assignments to the minimum > > during a rolling restart. In this direction I change the zk sequence > numbers > > on the SolrCloud nodes when all nodes of the cluster are up and active. > I'm > > using Solr 4.10.0 and I'm aware of SOLR-6491 which has a similar purpose > but > > I'm trying to do it from "outside", using the existing APIs without > editing > > Solr source code. > > > > == TYPICAL SCENARIO == > > Suppose we have 3 Solr instances S1,S2,S3. They are started in the same > > order and the zk sequences assigned have as follows > > S1:-n_0000000000 (LEADER) > > S2:-n_0000000001 > > S3:-n_0000000002 > > > > In a rolling restart we'll get S2 as leader (after S1 shutdown), then S3 > > (after S2 shutdown) and finally S1(after S3 shutdown), 3 changes in > total. > > > > == MY ATTEMPT == > > By using SolrZkClient and the Zookeeper multi API I found a way to get > rid > > of the old zknodes that participate in a shard's leader election and > write > > new ones where we can assign the sequence number of our liking. > > > > S1:-n_0000000000 (no code running here) > > S2:-n_0000000004 (code deleting zknode -n_0000000001 and creating > > -n_0000000004) > > S3:-n_0000000003 (code deleting zknode -n_0000000002 and creating > > -n_0000000003) > > > > In a rolling restart I'd expect to have S3 as leader (after S1 > shutdown), no > > change (after S2 shutdown) and finally S1(after S3 shutdown), that is 2 > > changes. This will be constant no matter how many servers are added in > > SolrCloud while in the first scenarion the # of re-assignments equals > the # > > of Solr servers. > > > > The problem occurs when S1 (LEADER) is shut down. The elections that take > > place still set S2 as leader, It's like ignoring the new sequence > numbers. > > When I go to /solr/#/~cloud?view=tree the new sequence numbers are listed > > under "/collections" based on which S3 should have become the leader. > > Do you have any idea why the new state is not acknowledged during the > > elections? Is something cached? Or to put it bluntly do I have any chance > > down this path? If not what are my options? Is it possible to apply all > > patches under SOLR-6491 in isolation and continue from there? > > > > Thank you. > > > > Extra info which might help follows > > 1. Some logging related to leader elections after S1 has been shut down > > S2 - org.apache.solr.cloud.SyncStrategy Leader's attempt to sync with > > shard failed, moving to the next candidate > > S2 - org.apache.solr.cloud.ShardLeaderElectionContext We failed sync, > > but we have no versions - we can't sync in that > > case - we were active before, so become leader anyway > > > > S3 - org.apache.solr.cloud.LeaderElector Our node is no longer in > line > > to be leader > > > > 2. And some sample code on how I perform the ZK re-sequencing > > // Read current zk nodes for a specific collection > > > > > solrServer.getZkStateReader().getZkClient().getSolrZooKeeper().getChildren("/collections/core/leader_elect/shard1 > > /election", true) > > // node deletion > > Op.delete(path, -1) > > // node creation > > Op.create(createPath, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, > > CreateMode.EPHEMERAL_SEQUENTIAL); > > // Perform operations > > > > > solrServer.getZkStateReader().getZkClient().getSolrZooKeeper().multi(opsList); > > solrServer.getZkStateReader().updateClusterState(true); > > > > > > > > > > -- > > View this message in context: > http://lucene.472066.n3.nabble.com/SolrCloud-shard-leader-elections-Altering-zookeeper-sequence-numbers-tp4178973.html > > Sent from the Solr - User mailing list archive at Nabble.com. >