On 2/27/2018 6:42 PM, James Keeney wrote:
-DzkHost=<ZK Host internal IP 1>:2181,<ZK Host internal IP 2>:2181,<ZK Host
internal IP 1>:2181

This looks correct, except that with AWS, I have no idea whether you need the internal IP addressing or the external IP addressing.  If all of the machines involved (both servers and clients) are able to communicate on the internal addresses, then that should be fine.  You might want to discuss the IP addressing with Amazon just to make sure.

java.net.ConnectException: Connection refused

All of the logs you included look like they have this message -- connection refused.  Normally this happens when the software isn't running -- the OS refuses connections when no software is listening on a TCP port.  Sometimes firewalls can refuse connections, but more commonly they just drop the traffic silently, and the system starting the connection has to wait for a timeout and never gets any kind of response.  In this case, there IS a response -- the connection is refused.

It looks like you've pasted parts of the log, but I was actually hoping for entire logfiles, or at least entire sections of logfiles, to see errors in context with non-errors, and to be sure that nothing is lost, and that the formatting isn't destroyed by inclusion in an email message.  A paste website or a file sharing website is often the best way to share that kind of information.  If you need to redact information from the files, please do so in a way that preserves the ability to decipher the log.  For IP addresses, you could just redact the first two octets and leave the last two -- although if they are private addresses, you could leave them intact.

My instinct here is to think there's either a fundamental networking issue (firewalls, other problems), or that there may be some kind of problem with ZK.  What version of ZK are you using on the servers, and what version of Solr is it?

My instincts could be wrong because of a limited understanding of how ZK functions.

My recommendation would be to run ZK version 3.4.11 on your servers.  Each new release of ZK has a very impressive list of fixed bugs.  The client ZK version will depend on the Solr version, since the ZK jar is part of Solr.

I looked at your ZK server config.  Your initLimit value is ten times what the default config for the embedded ZK in Solr is. Based on the comment in the embedded ZK config, that's probably not a problem, but I can't say for sure without more ZK knowledge.  The other parts of the config seem normal enough.

Are you configuring the "myid" file in each ZK server's data directory, and does the value on each server correspond to the line in the ZK config for that server?  I assume you probably have this correct, because ZK probably wouldn't work at all if it wasn't right.

I really don't know what might be going on.  Maybe with more complete logs I might spot something, but I don't know.

Thanks,
Shawn

Reply via email to