Hi Mike, Yes, please open a new Jira issue and attach your patch there. We can discuss more on the issue.
On Tue, Jul 28, 2015 at 11:40 AM, Michael Roberts <mrobe...@tableau.com> wrote: > Hey, > > I am encountering an issue which looks a lot like > https://issues.apache.org/jira/browse/SOLR-6763. > > However, it seems like the fix for that does not address the entire problem. > That fix will only work if we hit the zkClient.getChildren() call before the > reconnect logic has finished reconnecting us to ZooKeeper (I can reproduce > scenarios where it doesn’t in 4.10.4). If the reconnect has already happened, > we won’t get the session timeout exception. > > The specific problem I am seeing is slightly different SOLR-6763, but the > root cause appears to be the same. The issue that I am seeing is; during > startup the collections are registered and there is one > coreZkRegister-1-thread-* per collection. The elections are started on this > thread, the /collections/<name>/leader_elect ZNodes are created, and then the > thread blocks waiting for the peers to become available. During the block the > ZooKeeper session times out. > > Once we finish blocking, the reconnect logic calls register() for each > collection, which restarts the election process (although serially this > time). At a later point, we can have two threads that are trying to register > the same collection. > > This is incorrect, because the coreZkRegister-1-thread-’s are assuming they > are leader with no verification from zookeeper. The ephemeral leader_elect > nodes they created were removed when the session timed out. If another host > started in the interim (or any point after that actually), it would see no > leader, and would attempt to become leader of the shard itself. This leads to > some interesting race conditions, where you can end up with two leaders for a > shard. > > It seems like a more complete fix would be to actually close the > ElectionContext upon reconnect. This would break us out of the wait for peers > loop, and stop the threads from processing the rest of the leadership logic. > The reconnection logic would then continue to call register() again for each > Collection, and if the ZK state indicates it should be leader it can re-run > the leadership logic. > > I have a patch in testing that does this, and I think addresses the problem. > > What is the general process for this? I didn’t want to reopen a close Jira > item. Should I create a new one so the issue and the proposed fix can be > discussed? > > Thanks. > > Mike. > > -- Regards, Shalin Shekhar Mangar.