Yes, you can manually manipulate the data in Zookeeper, but as you
say that’s a “heroic” option. But even if it’s totally messed up, you’re no
worse off. You can use bin/solr zk… to copy individual znodes up and
down, or there are various tools to let you do the same if you have
them.

It’s also possible to shut the whole cluster down and bring up one and
only one node. NOTE: there’s something like a 3 minute wait before
the leader can be elected, so you can’t be impatient.

It should also be possible to create a parallel collection, leader only. By
parallel I mean the same number of shards, leader only. Then shut it down
and copy the corresponding data directory over from the sick collection
and start the new collection back up. Assuming it comes back, either
use collection aliasing to point to it or reverse the process. Take extreme
care to copy from the same shard range…. In fact, it might be easiest to
copy the index by using the _replication api_ to issue a fetchindex from the
sick node to the new one. That’s a low-level, command that bypasses
SolrCloud. All it needs is an HTTP connection between the source and target
machines.

Best,
Erick


> On Feb 14, 2020, at 9:49 AM, lstusr 5u93n4 <lstusr...@gmail.com> wrote:
> 
> Actually I should clarify: we stop solr on one of the nodes, wait for the
> other node to become the leader, and then start solr back up on the one
> that was stopped.
> 
> On Fri, 14 Feb 2020 at 09:41, lstusr 5u93n4 <lstusr...@gmail.com> wrote:
> 
>> We've seen this type of deadlock pretty often. Our recourse is to restart
>> solr on only one of the nodes, this seems to force the leader election to
>> take place and it soon stars rebuilding.
>> 
>> Let me know if you try that and it works... Wouldn't mind another
>> validation point that this happens to others...
>> 
>> Good luck!
>> 
>> On Fri, 14 Feb 2020 at 09:20, tedsolr <tsm...@sciquest.com> wrote:
>> 
>>> Yes I did Erick, and that didn't do it. What about manual manipulation of
>>> the
>>> zookeeper data? Rather than telling the customer they need to rebuild from
>>> scratch, I'd prefer to attempt some last minute heroics.
>>> 
>>> 
>>> 
>>> --
>>> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>> 
>> 

Reply via email to