On 4/29/2019 10:55 AM, Marko Babic wrote:
Thanks Shawn.

Yes, all Solr nodes know about all three ZK servers (i.e., the zk host string 
is of the form zk_a_ip:2181,zk_b_ip:2181,zk_c_ip:2181).

Sorry for the dense description of things: I erred on the side of oversharing 
because I didn't want to leave out something useful but I know it makes for an 
investment to read so I really appreciate that you took the time. I'm obviously 
happy to clarify whatever I can.

Trying to trace everything is making my head hurt. :)

Part of the problem is that I do not really know all that much about ZK's internal operation.

I do know that ZK clients maintain continuous connections (as long as they are able) to all of the servers in the zkhost string. I'm guessing that if one of the servers it can reach has been elected leader on the ensemble, it will be preferred to all others for that client to talk to.

My reading says that ephemeral nodes should be deleted whenever the client-server connection is lost for any reason. If I read your writeup correctly, somehow the network partition is interfering with this process... the ephemeral node probably is deleted on A (the leader when the partition begins) but it is not deleted on the new leader. This does sound like ZOOKEEPER-2348.

We probably need to take a look at how Solr handles its /live_nodes entries. I have not looked at this code, and have no idea how it works, but here is what I can think of:

Perhaps each Solr node should update its ephemeral node on a timed interval, say every 5 seconds. Longer if the update operation creates a lot of I/O. If the node exists exception is encountered when trying to create the node, the node should check the last updated timestamp, and once it reaches an age of 30 or 60 seconds (definitely configurable), the Solr node should assume that it's safe to delete and recreate. The log for this ought to be at WARN or ERROR (probably WARN) so they are visible in the admin UI. If some of the other devs who live in the SolrCloud code could offer a review of this idea, I would appreciate it.

In theory, two different Solr instances should never be trying to create the same ephemeral znode. In environments where servers are automatically provisioned and started, I suppose it could happen.

I've updated the ZK issue with info from this thread. I hope they can comment on that.

Thanks,
Shawn

Reply via email to