Yes, you can manually manipulate the data in Zookeeper, but as you say that’s a “heroic” option. But even if it’s totally messed up, you’re no worse off. You can use bin/solr zk… to copy individual znodes up and down, or there are various tools to let you do the same if you have them.
It’s also possible to shut the whole cluster down and bring up one and only one node. NOTE: there’s something like a 3 minute wait before the leader can be elected, so you can’t be impatient. It should also be possible to create a parallel collection, leader only. By parallel I mean the same number of shards, leader only. Then shut it down and copy the corresponding data directory over from the sick collection and start the new collection back up. Assuming it comes back, either use collection aliasing to point to it or reverse the process. Take extreme care to copy from the same shard range…. In fact, it might be easiest to copy the index by using the _replication api_ to issue a fetchindex from the sick node to the new one. That’s a low-level, command that bypasses SolrCloud. All it needs is an HTTP connection between the source and target machines. Best, Erick > On Feb 14, 2020, at 9:49 AM, lstusr 5u93n4 <lstusr...@gmail.com> wrote: > > Actually I should clarify: we stop solr on one of the nodes, wait for the > other node to become the leader, and then start solr back up on the one > that was stopped. > > On Fri, 14 Feb 2020 at 09:41, lstusr 5u93n4 <lstusr...@gmail.com> wrote: > >> We've seen this type of deadlock pretty often. Our recourse is to restart >> solr on only one of the nodes, this seems to force the leader election to >> take place and it soon stars rebuilding. >> >> Let me know if you try that and it works... Wouldn't mind another >> validation point that this happens to others... >> >> Good luck! >> >> On Fri, 14 Feb 2020 at 09:20, tedsolr <tsm...@sciquest.com> wrote: >> >>> Yes I did Erick, and that didn't do it. What about manual manipulation of >>> the >>> zookeeper data? Rather than telling the customer they need to rebuild from >>> scratch, I'd prefer to attempt some last minute heroics. >>> >>> >>> >>> -- >>> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html >>> >>