Re: SolrCloud behavior when a ZooKeeper node goes down

Erick Erickson Mon, 08 Feb 2016 13:48:28 -0800

My first guess would be that all of the ZK are configured with each other's
addresses available?


Or perhaps AWS is messing with your machine addresses....



On Mon, Feb 8, 2016 at 12:09 PM, Kelly, Frank <frank.ke...@here.com> wrote:

> We are running a small SolrCloud instance on AWS
>
> Solr : Version 5.3.1
> ZooKeeper: Version 3.4.6
>
> 3 x ZooKeeper nodes (with higher limits and timeouts due to being on AWS)
> 3 x Solr Nodes (8 GB of memory each – 2 collections with 3 shards for each
> collection)
>
> Let’s call the ZooKeeper nodes A, B and C.
> One of our ZooKeeper nodes (B) failed a health check and was replaced due
> to autoscaling , but during this time of failover
> our SolrCloud cluster became unavailable. All new connections to Solr were
> unable to connect complaining about connectivity issues
> and preexisting connections also had errors
>
> These errors happened for both querys and adds
>
> org.apache.solr.common.SolrException: Could not load collection from
> ZK:qa_us-east-1_here_account
>
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.getDocCollection(CloudSolrClient.java:1205)
>
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:837)
>
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:805)
>
> at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
>
> at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:943)
>
> at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:958)
>
> at
> com.here.scbe.search.solr.SolrFacadeImpl.querySearchIndex(SolrFacadeImpl.java:183)
>
> at
> com.ovi.scbe.search.search.impl.SolrSearcher.searchInner(SolrSearcher.java:69)
>
> at
> com.ovi.scbe.search.search.impl.SolrSearcher.search(SolrSearcher.java:56)
>
> at
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:345)
>
> at
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:342)
>
> at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:342)
>
>
> org.apache.solr.common.SolrException: Could not load collection from
> ZK:qa_us-east-1_public_index
>
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.getDocCollection(CloudSolrClient.java:1205)
>
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:837)
>
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:805)
>
> at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
>
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:107)
>
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:72)
>
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:86)
>
> at
> com.here.scbe.search.solr.SolrFacadeImpl.addToSearchIndex(SolrFacadeImpl.java:108)
>
> at com.ovi.scbe.search.index.impl.SolrIndexer.index(SolrIndexer.java:72)
>
> at
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:345)
>
> at
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:342)
>
> at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:342)
>
> I thought because we had configured SolrCloud to point at all three ZK
> nodes that the failure of one ZK node would be OK (since we still had a
> quorum).
>  Did I misunderstand something about SolrCloud and its relationship with
> ZK?
>
> The weird thing now is that when the new ZooKeeper node (D) started up –
> after a few minutes we could connect to SolrCloud again even though we were
> still only pointing to A,B and C (not D).
> Any thoughts on why this also happened?
>
> Best,
>
> -Frank
>
> *Frank Kelly*
>
> Principal Software Engineer
>
> Predictive Analytics Team (SCBE/HAC/CDA)
>
>
> *HERE *
>
> 5 Wayside Rd, Burlington, MA 01803, USA
>
> *42° 29' 7" N 71° 11' 32” W*
>
>
> <http://360.here.com/>   <https://twitter.com/here>
> <https://www.facebook.com/here>    <https://linkedin.com/company/heremaps>
>   <https://www.instagram.com/here>
>
>

Re: SolrCloud behavior when a ZooKeeper node goes down

Reply via email to