First, be sure to wait at least 3 minutes before concluding the replicas are permanently down, that’s the default wait period for certain leader election fallbacks. It’s easy to conclude it’s never going to recover, 180 seconds is an eternity ;).
You can try the collections API FORCELEADER command. Assuming a leader is elected and becomes active, you _may_ have to restart the other two Solr nodes. How did you stop the servers? You mention disaster recovery, so I’m thinking you did a “kill -9” or similar? Were you actively indexing at the time? Solr _should_ manage the recovery even in that case, I’m mostly wondering what the sequence of events that lead up to this was… Best, Erick > On Feb 4, 2020, at 8:38 AM, Joseph Lorenzini <jalo...@gmail.com> wrote: > > Hi all, > > I have a 3 node solr cloud instance with a single collection. The solr > nodes are pointed to a 3-node zookeeper ensemble. I was doing some basic > disaster recovery testing and have encountered a problem that hasn't been > obvious to me on how to fix. > > After i started back up the three solr java processes, i can see that they > are registered back in the solr UI. However, each replica is in a down > state permanently. there are no logs in either solr or zookeeper that may > indicate what the the problem would be -- neither exceptions nor warnings. > > So is there any way to collect more diagnostics to figure out what's going > on? Short of deleting and recreating the replicas is there any way to fix > this? > > Thanks, > Joe