Re: Increasing Fault Tolerance of SOLR Cloud and Zookeeper

2018-12-14 Thread Erick Erickson
The only substantive change to the _code_ was changing these lines: permission javax.security.auth.kerberos.ServicePermission "zookeeper/127.0@example.com", "initiate"; permission javax.security.auth.kerberos.ServicePermission "zookeeper/127.0@example.com", "accept"; to permission javax.se

Re: Increasing Fault Tolerance of SOLR Cloud and Zookeeper

2018-12-14 Thread Stephen Lewis Bianamara
Thanks Erick, you've been very helpful. One other question I have, is it reasonable to upgrade zookeeper on an in-place SOLR? I see that 12727 appears to be verified with SOLR 7 modulo some test issues. For SOLR 6.6, would upgrading zookeeper to this version be advisable, or would you say that it w

Re: Increasing Fault Tolerance of SOLR Cloud and Zookeeper

2018-12-13 Thread Erick Erickson
bq. will the leader still report that there were two followers, even if one of them bounced I really can't say, I took the ZK folks' at their word and upgraded. I should think that restarting your ZK nodes should reestablish that they are all talking to each other, you may need to restart your So

Re: Increasing Fault Tolerance of SOLR Cloud and Zookeeper

2018-12-13 Thread Stephen Lewis Bianamara
Thanks for the help Erick. This is an external zookeeper, running on three separate AWS instances separate from the instances hosting SOLR. I think I have some more insight based on the bug you sent and some more log crawling. In October we had an instance retirement, wherein the instance was aut

Re: Increasing Fault Tolerance of SOLR Cloud and Zookeeper

2018-12-13 Thread Erick Erickson
Updates are disabled means that at least two of your three ZK nodes are unreachable, which is worrisome. First: That error is coming from Solr, but whether it's a Solr issue or a ZK issue is ambiguous. Might be explained if the ZK nodes are under heavy load. Question: Is this an external ZK ensemb