Daniel Collins wrote > Is it important where your leader is? If you just want to minimize > leadership changes during rolling re-start, then you could restart in the > opposite order (S3, S2, S1). That would give only 1 transition, but the > end result would be a leader on S2 instead of S1 (not sure if that > important to you or not). I know its not a "fix", but it might be a > workaround until the whole leadership moving is done?
I think that rolling restarting the machines in the opposite order (S3,S2,S1) will result in S3 being the leader. It's a valid approach but shouldn't I have to revert to the original order (S1,S2,S3) to achieve the same result in the following rolling restart? This includes operational costs and complexity that I want to avoid. Erick Erickson wrote >> Just skimming, but the problem here that I ran into was with the >> listeners. Each _Solr_ instance out there is listening to one of the >> ephemeral nodes (the "one in front"). So deleting a node does _not_ >> change which ephemeral node the associated Solr instance is listening >> to. >> >> So, for instance, when you delete S2..n-000001 and re-add it, S2 is >> still looking at S1....n-000000 and will continue looking at >> S1...n-000000 until S1....n-000000 is deleted. >> >> Deleting S2..n-000001 will wake up S3 though, which should now be >> looking at S1....n-0000000. Now you have two Solr listeners looking at >> the same ephemeral node. The key is that deleting S2...n-000001 does >> _not_ wake up S2, just any solr instance that has a watch on the >> associated ephemeral node. Thanks for the info Erick. I wasn't aware of this "linked-list" listeners structure between the zk nodes. Based on what you've said though I've changed my implementation a bit and it seems to be working at first glance. Of course it's not reliable yet but it looks promising. My original attempt > S1:-n_0000000000 (no code running here) > S2:-n_0000000004 (code deleting zknode -n_0000000001 and creating > -n_0000000004) > S3:-n_0000000003 (code deleting zknode -n_0000000002 and creating > -n_0000000003) has been changed to S1:-n_0000000000 (no code running here) S2:-n_0000000003 (code deleting zknode -n_0000000001 and creating -n_0000000003 using EPHEMERAL_SEQUENTIAL) S3:-n_0000000002 (no code running here) Once S1 is shutdown S3 becomes leader since it listens to S1 now according to what you've said The original reason I pursued this "minimize leadership changes" quest was that it _could_ lead to "data loss" in some scenarios. I'm not entirely sure though and you could correct me on this and but I'm explaining myself. If you have incoming indexing requests during a rolling restart, could there be a case during the "current leader shutdown" where the "leader-to-be-node" could not have the time to sync with the "current-leader-that-shut-downs-node" in which case everyone will now sync to the new leader thus missing some updates. I've seen an installation having different index sizes in each replica that deteriorated over time. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-shard-leader-elections-Altering-zookeeper-sequence-numbers-tp4178973p4179147.html Sent from the Solr - User mailing list archive at Nabble.com.