On 2/27/2018 10:57 AM, James Keeney wrote:
> *1 - ZK ensemble not accepting return of node*
> Currently, when a ZK node in the ensemble goes down the ensemble is able to
> do what it should do and keeps working. However when I bring the 3rd node
> back online the other two nodes reject connection requests from the 3rd
> node until I restart the nodes. The sequence is:
>
>    1. Bring 3rd node back on line
>    2. Restart follower in existing ensemble
>    3. Restart leader in existing ensemble
>
> When this is done the third node happily becomes part fo the ensemble.

>From what I understand, restarting the other nodes should not be
required.  If everything is configured properly, I don't think that
should be happening, but I don't have deep ZK knowledge.

> *2 - Solr nodes unable to connect*
> When setting up the cluster for the first time the ensemble rejects the
> solr connection requests until the ZK on the ZK ensemble members is
> restarted.

<snip>

> However, we have also seen that if we have a problem with one of the Solr
> nodes that requires restarting more than one node we have to restart ZK to
> reconnect the nodes with thee ensemble again.

These problems sound very weird too.  I wish I had some idea, but
without logs showing what kind of errors are encountered, I have no idea
what's happening.

None of these problems are in Solr code.  Solr uses the ZooKeeper client
code without modification.  All the ZK communication is done in ZK code,
initialized with the zkHost string and a few other config bits (like
zkClientTimeout) provided to Solr at startup.

If you want to share the Solr log and the ZK server logs covering the
timeframe when the problems happen, maybe we can find something useful
and at least point you towards the problem, but even then, you may have
to talk to the ZooKeeper mailing list for real help, and they'll want
the same logs.

Are you informing Solr about all three of your ZK hosts when you start
it up?  That is a requirement.  If the zkHost string you send to Solr
doesn't list all your servers, then the ZK client inside Solr will not
be able to fail over correctly.  The version of ZK that Solr includes is
not able to dynamically change the servers that it talks to, and the
version of ZK that *does* have dynamic reconfiguration is still in
beta.  Solr is not going to include ZK 3.5.x until they put out a stable
release.  I don't know when they're going to do that.  It could be soon,
or it could be several months out.  The ZK project does NOT make
frequent releases.

Thanks,
Shawn

Reply via email to