Re: ClusterState says we are the leader, but locally we don't think so

Jon Drews Tue, 31 May 2016 10:52:49 -0700

I forgot to add that this is Apache Solr 5.3.1.

There are three collections, two of which have one shard and and the other
has 3-5 shards. Approximately 200,000 documents across all collections.


Jon Drews
jondrews.com

On Tue, May 31, 2016 at 12:15 PM, Jon Drews <j...@jondrews.com> wrote:

> We have seen the following error on four separate instances of Solr. The
> result is that all or most shards go into "Down" state and do not recover
> on restart of Solr.
>
> I'm hoping one of you has some insight into what might be causing it as we
> haven't been able to track down the issue or reproduce it reliably.
>
> 2016-05-26 21:00:09.000 ERROR (qtp1450821318-15) [c:log s:20160526
> r:core_node4 x:log_20160526_replica1] o.a.s.c.SolrCore
> org.apache.solr.common.SolrException: ClusterState says we are the leader (
> https://localhost:8984/solr/log_20160526_replica1), but locally we don't
> think so. Request came from
> https://localhost:8984/solr/log_20160524_replica1/
>
> We were able to recover by using https://github.com/echoma/zkui/ to
> manually edit the /clusterstate.json and /collections/log/state.json to set
> shards from "Down" to "Active". After that the error subsided and
> functionality was restored.
>
> A few notes:
> - All four systems were on either Windows 7 or Windows Server 2012.
> - All four systems are on single servers with embedded zookeepers.
> - SSL was enabled in Solr, but no authentication
> - After the issue, we increased the zkClientTimeout and restarted, however
> all shards were still in a Down state and error persisted.
> - Migrating the solr instance to a new Windows install did not solve issue.
>
> Please let me know if you have any ideas as to why this is happening and
> possible solutions. Thanks!
>

Re: ClusterState says we are the leader, but locally we don't think so

Reply via email to