On 4/29/2019 10:55 AM, Marko Babic wrote:
Thanks Shawn.
Yes, all Solr nodes know about all three ZK servers (i.e., the zk host string
is of the form zk_a_ip:2181,zk_b_ip:2181,zk_c_ip:2181).
Sorry for the dense description of things: I erred on the side of oversharing
because I didn't want to leave out something useful but I know it makes for an
investment to read so I really appreciate that you took the time. I'm obviously
happy to clarify whatever I can.
Trying to trace everything is making my head hurt. :)
Part of the problem is that I do not really know all that much about
ZK's internal operation.
I do know that ZK clients maintain continuous connections (as long as
they are able) to all of the servers in the zkhost string. I'm guessing
that if one of the servers it can reach has been elected leader on the
ensemble, it will be preferred to all others for that client to talk to.
My reading says that ephemeral nodes should be deleted whenever the
client-server connection is lost for any reason. If I read your writeup
correctly, somehow the network partition is interfering with this
process... the ephemeral node probably is deleted on A (the leader when
the partition begins) but it is not deleted on the new leader. This
does sound like ZOOKEEPER-2348.
We probably need to take a look at how Solr handles its /live_nodes
entries. I have not looked at this code, and have no idea how it works,
but here is what I can think of:
Perhaps each Solr node should update its ephemeral node on a timed
interval, say every 5 seconds. Longer if the update operation creates a
lot of I/O. If the node exists exception is encountered when trying to
create the node, the node should check the last updated timestamp, and
once it reaches an age of 30 or 60 seconds (definitely configurable),
the Solr node should assume that it's safe to delete and recreate. The
log for this ought to be at WARN or ERROR (probably WARN) so they are
visible in the admin UI. If some of the other devs who live in the
SolrCloud code could offer a review of this idea, I would appreciate it.
In theory, two different Solr instances should never be trying to create
the same ephemeral znode. In environments where servers are
automatically provisioned and started, I suppose it could happen.
I've updated the ZK issue with info from this thread. I hope they can
comment on that.
Thanks,
Shawn