Hi Mark,

Thanks for responding.

Looking under /collections/solr_signal/leader_elect/shard5/election/ there
are 2 nodes:

161276082334072879-ADDR1:8983_solr_solr_signal-n_0000000053 - *Mon Mar 18
17:36:41 UTC 2013*
161276082334072880-ADDR2:8983_solr_solr_signal-n_0000000056 - *Mon Mar 18
17:48:22 UTC 2013*

So looks like the election of ADDR2 (the node that cannot recover) is later
than ADDR1 (node is still online and serving requests)

Could I just delete that newer node from ZK?
*
*
*Cheers,*
*Tim*
*
*
On Mon, Mar 18, 2013 at 12:04 PM, Mark Miller <markrmil...@gmail.com> wrote:

> Hmm…
>
> Sounds like it's a defensive mechanism we have where a leader will check
> it's own state about whether it thinks it's the leader with the zk info. In
> this case it's own state is not convinced of it's leadership. That's just a
> volatile boolean that gets flipped on when elected.
>
> What do the election nodes in ZooKeeper say? Who do they think the leader
> is?
>
> Something is off, but I'm kind of surprised restarting the leader doesn't
> fix it. Someone else should register as the leader or the restarted node
> should reclaim it's spot.
>
> I have no idea if this is solved in 4.2 or not since I don't really know
> what's happened, but I'd love to get to the bottom of it.
>
> After setting the leader volatile boolean to true, the only way it goes
> false other than restart is session expiration. In that case we do flip to
> false - but session expiration should also cause the leader node to drop…
>
>
> - Mark
>
> On Mar 18, 2013, at 1:57 PM, Timothy Potter <thelabd...@gmail.com> wrote:
>
> > Having an issue running on a nightly build of Solr 4.1 (tag -
> > 4.1.0.2013.01.10.20.44.27)
> >
> > I had a replica fail and when trying to bring it back online, recovery
> > fails because the leader responds with "We are not the leader" (see trace
> > below).
> >
> > SEVERE: org.apache.solr.common.SolrException: We are not the leader
> >
> >    at
> >
> org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler.java:907)
> >
> >    at
> >
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:188)
> >
> >    at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> >
> >    at
> >
> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:365)
> >
> >    at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
> >
> >    at
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
> >
> > ...
> >
> > The worrisome part is the clusterstate.json seems to show this node
> (ADDR1)
> > is the leader (I obfuscated addresses using ADDR1 and 2):
> >
> >      "shard5":{
> >
> >        "range":"b8e30000-c71bffff",
> >
> >        "replicas":{
> >
> >          "ADDR1:8983_solr_solr_signal":{
> >
> >            "shard":"shard5",
> >
> >            "roles":null,
> >
> >            "state":"active",
> >
> >            "core":"solr_signal",
> >
> >            "collection":"solr_signal",
> >
> >            "node_name":"ADDR1:8983_solr",
> >
> >            "base_url":"http://ADDR1:8983/solr";,
> >
> >          *  "leader":"true"},*
> >
> >          "ADDR2:8983_solr_solr_signal":{
> >
> >            "shard":"shard5",
> >
> >            "roles":null,
> >
> >           * "state":"recovering",*
> >
> >            "core":"solr_signal",
> >
> >            "collection":"solr_signal",
> >
> >            "node_name":"ADDR2:8983_solr",
> >
> >            "base_url":"http://ADDR2:8983/solr"}}},
> >
> >
> > I assume the obvious answer is to upgrade to 4.2. I'm willing to go down
> > that path but wanted to see if there was something quick I could do to
> get
> > the leader to start thinking it is the leader again. Restarting it
> doesn't
> > seem to do the trick.
>
>

Reply via email to