Re: Handling All Replicas Down in Solr 8.3 Cloud Collection

Erick Erickson Tue, 04 Feb 2020 05:55:02 -0800

First, be sure to wait at least 3 minutes before concluding the replicas are 
permanently down, that’s the default wait period for certain leader election 
fallbacks. It’s easy to conclude it’s never going to recover, 180 seconds is an 
eternity ;).

You can try the collections API FORCELEADER command. Assuming a leader is 
elected and becomes active, you _may_ have to restart the other two Solr nodes.

How did you stop the servers? You mention disaster recovery, so I’m thinking 
you did a “kill -9” or similar? Were you actively indexing at the time? Solr 
_should_ manage the recovery even in that case, I’m mostly wondering what the 
sequence of events that lead up to this was…

Best,
Erick

> On Feb 4, 2020, at 8:38 AM, Joseph Lorenzini <jalo...@gmail.com> wrote:
> 
> Hi all,
> 
> I have a 3 node solr cloud instance with a single collection. The solr
> nodes are pointed to a 3-node zookeeper ensemble. I was doing some basic
> disaster recovery testing and have encountered a problem that hasn't been
> obvious to me on how to fix.
> 
> After i started back up the three solr java processes, i can see that they
> are registered back in the solr UI. However, each replica is in a down
> state permanently. there are no logs in either solr or zookeeper that may
> indicate what the the problem would be -- neither exceptions nor warnings.
> 
> So is there any way to collect more diagnostics to figure out what's going
> on? Short of deleting and recreating the replicas is there any way to fix
> this?
> 
> Thanks,
> Joe

Re: Handling All Replicas Down in Solr 8.3 Cloud Collection

Reply via email to