Re: Configuration of SOLR Cluster

Shawn Heisey Wed, 28 Feb 2018 20:34:46 -0800

On 2/28/2018 6:54 AM, James Keeney wrote:

I did notice one thing in the logs:
2018-02-28 13:21:58,932 [myid:1] - INFO[/172.31.86.130:3888:QuorumCnxManager$Listener@743] - *Receivedconnection request /172.31.73.122:34804 <http://172.31.73.122:34804>*


<snip>

When the restarted node attempts to reconnect with the ensemble itlooks like it does so on a random port. Could it be that nodes in theensemble are rejecting the new request to rejoin because they are notlistening on that port? And why is it not requesting on 3888:2888?This is confusing to me.

That appears to be the source port. Which is generally going to be avery high port and semi-unpredictable.Normal TCP operation.

I have attached a ZK log and a SOLR log. You can watch the wholeprogression in the ZK log as it goes from happy to disconnected totrying to reconnect to part of the ensemble when the other nodes arerestarted. Seems like ZK holds onto a state based on the originalensemble interactions and that state prevents the node from rejoiningthe ensemble. The state is then lost with the restart which allows themembers to re-establish connection and form the new ensemble.


What timestamps correspond to the actions you took?

Lots and lots of connections refused. Unless there's somethingpreventing network access, I would only expect connections to be refusedif the software isn't running or isn't listening on the destination port.

Which ZK is that log from? The one that you shut down to begin testing,or one of the others? I see some very large time gaps in the log:


=========

2018-02-26 18:28:18,066 [myid:1] - INFO[LearnerHandler-/172.31.73.122:57652:LearnerHandler@535] - ReceivedNEWLEADER-ACK message from 32018-02-26 18:56:19,656 [myid:1] - INFO [SyncThread:1:FileTxnLog@203] -Creating new log file: log.400000001

=========

=========

2018-02-26 18:56:26,286 [myid:1] - WARN[SendWorker:3:QuorumCnxManager$SendWorker@954] - Send worker leaving thread2018-02-26 19:34:38,103 [myid:] - INFO [main:QuorumPeerConfig@134] -Reading configuration from: /opt/zookeeper/current/bin/../conf/zoo.cfg

=========

The first gap is nearly half an hour, the second is more than half an hour.

What happens after the second gap appears to be a program startup. Thethings logged at 18:56:nn *might* be program shutdown, but the logdoesn't explicitly say so. If it is a shutdown, then the program wasnot running for quite a while.

I would definitely take this problem to the ZK mailing list. Theserver-side problems don't involve Solr at all. You are having problemswith Solr, but they are completely within the ZK client code. Likelyboth problems have the same root cause, so I'd start with theserver-side issues.


Solr 6.6.2 contains ZK version 3.4.10.  Not the latest, but close.

Thanks,
Shawn

Re: Configuration of SOLR Cluster

Reply via email to