Hi Mike,

Yes, please open a new Jira issue and attach your patch there. We can
discuss more on the issue.

On Tue, Jul 28, 2015 at 11:40 AM, Michael Roberts <mrobe...@tableau.com> wrote:
> Hey,
>
> I am encountering an issue which looks a lot like 
> https://issues.apache.org/jira/browse/SOLR-6763.
>
> However, it seems like the fix for that does not address the entire problem. 
> That fix will only work if we hit the zkClient.getChildren() call before the 
> reconnect logic has finished reconnecting us to ZooKeeper (I can reproduce 
> scenarios where it doesn’t in 4.10.4). If the reconnect has already happened, 
> we won’t get the session timeout exception.
>
> The specific problem I am seeing is slightly different SOLR-6763, but the 
> root cause appears to be the same. The issue that I am seeing is; during 
> startup the collections are registered and there is one 
> coreZkRegister-1-thread-* per collection. The elections are started on this 
> thread, the /collections/<name>/leader_elect ZNodes are created, and then the 
> thread blocks waiting for the peers to become available. During the block the 
> ZooKeeper session times out.
>
> Once we finish blocking, the reconnect logic calls register() for each 
> collection, which restarts the election process (although serially this 
> time). At a later point, we can have two threads that are trying to register 
> the same collection.
>
> This is incorrect, because the coreZkRegister-1-thread-’s are assuming they 
> are leader with no verification from zookeeper. The ephemeral leader_elect 
> nodes they created were removed when the session timed out. If another host 
> started in the interim (or any point after that actually), it would see no 
> leader, and would attempt to become leader of the shard itself. This leads to 
> some interesting race conditions, where you can end up with two leaders for a 
> shard.
>
> It seems like a more complete fix would be to actually close the 
> ElectionContext upon reconnect. This would break us out of the wait for peers 
> loop, and stop the threads from processing the rest of the leadership logic. 
> The reconnection logic would then continue to call register() again for each 
> Collection, and if the ZK state indicates it should be leader it can re-run 
> the leadership logic.
>
> I have a patch in testing that does this, and I think addresses the problem.
>
> What is the general process for this? I didn’t want to reopen a close Jira 
> item. Should I create a new one so the issue and the proposed fix can be 
> discussed?
>
> Thanks.
>
> Mike.
>
>



-- 
Regards,
Shalin Shekhar Mangar.

Reply via email to