Hmm…

Sounds like it's a defensive mechanism we have where a leader will check it's 
own state about whether it thinks it's the leader with the zk info. In this 
case it's own state is not convinced of it's leadership. That's just a volatile 
boolean that gets flipped on when elected.

What do the election nodes in ZooKeeper say? Who do they think the leader is?

Something is off, but I'm kind of surprised restarting the leader doesn't fix 
it. Someone else should register as the leader or the restarted node should 
reclaim it's spot.

I have no idea if this is solved in 4.2 or not since I don't really know what's 
happened, but I'd love to get to the bottom of it.

After setting the leader volatile boolean to true, the only way it goes false 
other than restart is session expiration. In that case we do flip to false - 
but session expiration should also cause the leader node to drop…


- Mark

On Mar 18, 2013, at 1:57 PM, Timothy Potter <thelabd...@gmail.com> wrote:

> Having an issue running on a nightly build of Solr 4.1 (tag -
> 4.1.0.2013.01.10.20.44.27)
> 
> I had a replica fail and when trying to bring it back online, recovery
> fails because the leader responds with "We are not the leader" (see trace
> below).
> 
> SEVERE: org.apache.solr.common.SolrException: We are not the leader
> 
>    at
> org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler.java:907)
> 
>    at
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:188)
> 
>    at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> 
>    at
> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:365)
> 
>    at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
> 
>    at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
> 
> ...
> 
> The worrisome part is the clusterstate.json seems to show this node (ADDR1)
> is the leader (I obfuscated addresses using ADDR1 and 2):
> 
>      "shard5":{
> 
>        "range":"b8e30000-c71bffff",
> 
>        "replicas":{
> 
>          "ADDR1:8983_solr_solr_signal":{
> 
>            "shard":"shard5",
> 
>            "roles":null,
> 
>            "state":"active",
> 
>            "core":"solr_signal",
> 
>            "collection":"solr_signal",
> 
>            "node_name":"ADDR1:8983_solr",
> 
>            "base_url":"http://ADDR1:8983/solr";,
> 
>          *  "leader":"true"},*
> 
>          "ADDR2:8983_solr_solr_signal":{
> 
>            "shard":"shard5",
> 
>            "roles":null,
> 
>           * "state":"recovering",*
> 
>            "core":"solr_signal",
> 
>            "collection":"solr_signal",
> 
>            "node_name":"ADDR2:8983_solr",
> 
>            "base_url":"http://ADDR2:8983/solr"}}},
> 
> 
> I assume the obvious answer is to upgrade to 4.2. I'm willing to go down
> that path but wanted to see if there was something quick I could do to get
> the leader to start thinking it is the leader again. Restarting it doesn't
> seem to do the trick.

Reply via email to