There are a couple of options: 1> stop all your nodes. Start them one at a time and wait for "leader election" to occur. This can take several minutes, but eventually the replicas on that machine will become the leader. Then start the other nodes, again one at a time waiting for them to recover fully before starting the next node.
2> you can try the FORCELEADER collecrions API option.. The leater election and retry logic has been vastly improved in 7.3+ (with some of the last improvements in 7.5). Best, Erick On Sun, Dec 23, 2018 at 1:43 AM Vadim Ivanov <vadim.iva...@spb.ntk-intourist.ru> wrote: > > Hi! > After restart of nodes I have situation when no leader on shard can be > elected > Shard rpk51_222_306 resides on 3 nodes (solr00, solr06, solr09) with > corresponding replica names > (rpk51_222_306_00, rpk51_222_306_06, rpk51_222_306_09) > Logs looks like this > PeerSync: core=rpk51_222_306_00 url=http://solr00:8983/solr Requested 26 > updates from http://solr06:8983/solr/rpk51_222_306_06/ but retrieved 25 > PeerSync: core=rpk51_222_306_06 url=http://solr06:8983/solr Requested 29 > updates from http://solr00:8983/solr/rpk51_222_306_00/ but retrieved 24 > PeerSync: core=rpk51_222_306_09 url=http://solr09:8983/solr Requested 26 > updates from http://solr06:8983/solr/rpk51_222_306_06/ but retrieved 25 > > 00 and 09 tries to recover from 06 and fail > 06 tries to recover from 00 and fail > > It goes continuously every minute and forever > > How to break this deadlock loop? > -- > Vadim > >