Any thoughts on this?

Hoping for just a quick
1) Yes - once ZooKeeper loses a Quorum you need to restart Solr and your
SolrJ Client
2) No - that¹s not expected behavior - Solr and SolrJ should recover -
please file a JIRA issue

Cheers!

Frank Kelly
Principal Software Engineer
Predictive Analytics Team (SCBE/HAC/CDA)

HERE 
5 Wayside Rd, Burlington, MA 01803, USA
42° 29' 7" N 71° 11' 32² W

 <http://360.here.com/>   <https://twitter.com/here>
<https://www.facebook.com/here>    <https://linkedin.com/company/heremaps>
   <https://www.instagram.com/here>







On 3/16/16, 8:54 AM, "Kelly, Frank" <frank.ke...@here.com> wrote:

><This time without images :-) >
>
>Just wondering if my observation of SolrCloud behavior after ZooKeeper
>loses a quorum is normal or to-be-expected
>
>Version of Solr: 5.3.1
>Version of ZooKeeper: 3.4.7
>Using SolrCloud with external ZooKeeper
>Deployed on AWS
>
>Our Solr cluster has 3 nodes
>
>Our Zookeeper ensemble consists of three nodes with the same config using
>DNS names e.g.
>
>$ more ../conf/zoo.cfg
>tickTime=2000
>dataDir=/var/zookeeper
>dataLogDir=/var/log/zookeeper
>clientPort=2181
>initLimit=10
>syncLimit=5
>standaloneEnabled=false
>server.1=zookeeper1.qa.eu-west-1.mysearch.com:2888:3888
>server.2=zookeeper2.qa.eu-west-1.mysearch.com:2888:3888
>server.3=zookeeper3.qa.eu-west-1.mysearch.com:2888:3888
>
>If we terminate one of the zookeeper nodes we get a ZK election (and I
>think) a quorum is maintained.
>Operation continues OK and we detect the terminated instance and relaunch
>a new ZK node which comes up fine
>
>If we terminate two of the ZK nodes we lose a quorum and then we observe
>the following
>
>1.1) Admin UI shows an error that it is unable to contact ZooKeeper
>³Could not connect to ZooKeeper"
>
>1.2) SolrJ returns the following
>
>org.apache.solr.common.SolrException: Could not load collection from
>ZK:qa_eu-west-1_public_index
>at 
>org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader
>.java:850)
>at 
>org.apache.solr.common.cloud.ZkStateReader$7.get(ZkStateReader.java:515)
>at 
>org.apache.solr.client.solrj.impl.CloudSolrClient.getDocCollection(CloudSo
>lrClient.java:1205)
>at 
>org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleS
>tate(CloudSolrClient.java:837)
>at 
>org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.
>java:805)
>at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
>at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:107)
>at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:72)
>at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:86)
>at 
>com.here.scbe.search.solr.SolrFacadeImpl.addToSearchIndex(SolrFacadeImpl.j
>ava:112)
>Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
>KeeperErrorCode = ConnectionLoss for
>/collections/qa_eu-west-1_public_index/state.json
>at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
>at 
>org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:345)
>at 
>org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:342)
>at 
>org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.ja
>va:61)
>at 
>org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:342)
>at 
>org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader
>.java:841)
>... 24 more
>
>This makes sense based on our understanding.
>When our AutoScale groups launch two new ZooKeeper nodes, initialize
>them, fix the DNS etc. we regain a quorum but at this point
>
>2.1) Admin UI shows the shards as ³GONE² (all greyed out)
>
>2.2) SolrJ returns the same error even though the ZooKeeper DNS names are
>now bound to new IP addresses
>
>So at this point I restart the Solr nodes. At this point then
>
>3.1) Admin UI shows the collections as OK (all shards are green) ­ yeah
>the nodes are back!
>
>3.2) SolrJ Client still shows the same error ­ namely
>
>org.apache.solr.common.SolrException: Could not load collection from
>ZK:qa_eu-west-1_here_account
>at 
>org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader
>.java:850)
>at 
>org.apache.solr.common.cloud.ZkStateReader$7.get(ZkStateReader.java:515)
>at 
>org.apache.solr.client.solrj.impl.CloudSolrClient.getDocCollection(CloudSo
>lrClient.java:1205)
>at 
>org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleS
>tate(CloudSolrClient.java:837)
>at 
>org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.
>java:805)
>at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
>at org.apache.solr.client.solrj.SolrClient.deleteById(SolrClient.java:825)
>at org.apache.solr.client.solrj.SolrClient.deleteById(SolrClient.java:788)
>at org.apache.solr.client.solrj.SolrClient.deleteById(SolrClient.java:803)
>at 
>com.here.scbe.search.solr.SolrFacadeImpl.deleteById(SolrFacadeImpl.java:25
>7)
>.
>.
>Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
>KeeperErrorCode = ConnectionLoss for
>/collections/qa_eu-west-1_here_account/state.json
>at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
>at 
>org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:345)
>at 
>org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:342)
>at 
>org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.ja
>va:61)
>at 
>org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:342)
>at 
>org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader
>.java:841)
>
>I have a few questions
>1) Is this behavior (lack of self-healing) a known behavior?
>2) Is this the same or similar behavior as documented here
>https://issues.apache.org/jira/browse/SOLR-5129
>3) If it is not covered by #2 should I log it in JIRA?
>
>Thanks and Best Wishes,
>
>-Frank
>
>p.s. I can add Solr log files if they will help
>
>
>Frank Kelly
>Principal Software Engineer
>Predictive Analytics Team (SCBE/HAC/CDA)
>
>
>
>
>
>
>HERE
>5 Wayside Rd, Burlington, MA 01803, USA
>42° 29' 7" N 71° 11' 32² W
>
>
>
>
>
>

Reply via email to