We are running a small SolrCloud instance on AWS

Solr : Version 5.3.1
ZooKeeper: Version 3.4.6

3 x ZooKeeper nodes (with higher limits and timeouts due to being on AWS)
3 x Solr Nodes (8 GB of memory each - 2 collections with 3 shards for each 
collection)

Let's call the ZooKeeper nodes A, B and C.
One of our ZooKeeper nodes (B) failed a health check and was replaced due to 
autoscaling , but during this time of failover
our SolrCloud cluster became unavailable. All new connections to Solr were 
unable to connect complaining about connectivity issues
and preexisting connections also had errors

These errors happened for both querys and adds

org.apache.solr.common.SolrException: Could not load collection from 
ZK:qa_us-east-1_here_account

at 
org.apache.solr.client.solrj.impl.CloudSolrClient.getDocCollection(CloudSolrClient.java:1205)

at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:837)

at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:805)

at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)

at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:943)

at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:958)

at 
com.here.scbe.search.solr.SolrFacadeImpl.querySearchIndex(SolrFacadeImpl.java:183)

at 
com.ovi.scbe.search.search.impl.SolrSearcher.searchInner(SolrSearcher.java:69)

at com.ovi.scbe.search.search.impl.SolrSearcher.search(SolrSearcher.java:56)

at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:345)

at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:342)

at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:342)



org.apache.solr.common.SolrException: Could not load collection from 
ZK:qa_us-east-1_public_index

at 
org.apache.solr.client.solrj.impl.CloudSolrClient.getDocCollection(CloudSolrClient.java:1205)

at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:837)

at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:805)

at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)

at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:107)

at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:72)

at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:86)

at 
com.here.scbe.search.solr.SolrFacadeImpl.addToSearchIndex(SolrFacadeImpl.java:108)

at com.ovi.scbe.search.index.impl.SolrIndexer.index(SolrIndexer.java:72)

at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:345)

at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:342)

at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:342)

I thought because we had configured SolrCloud to point at all three ZK nodes 
that the failure of one ZK node would be OK (since we still had a quorum).
 Did I misunderstand something about SolrCloud and its relationship with ZK?

The weird thing now is that when the new ZooKeeper node (D) started up - after 
a few minutes we could connect to SolrCloud again even though we were still 
only pointing to A,B and C (not D).
Any thoughts on why this also happened?

Best,

-Frank

[cid:4BEEB30D-EF88-4787-B5F3-E6BF0E951BE3]
Frank Kelly
Principal Software Engineer
Predictive Analytics Team (SCBE/HAC/CDA)






HERE
5 Wayside Rd, Burlington, MA 01803, USA
42° 29' 7" N 71° 11' 32" W

[cid:92482087-2AF4-4A90-9097-2CC3B0F9BFEB]<http://360.here.com/>  
[cid:4FC535C5-9858-4C8C-A8E3-E656910D0DCA] <https://twitter.com/here>   
[cid:527F4AAD-8F3D-4270-94A3-D69A29E2CCBF] <https://www.facebook.com/here>    
[cid:3147AF0F-7BA9-4466-A271-0AA00F6FABB4] 
<https://linkedin.com/company/heremaps>    
[cid:F0105D77-5164-4306-91EC-F1F9E6E31A85] <https://www.instagram.com/here>





Reply via email to