Re: Replicas stuck in DOWN state

2021-04-29 Thread David Smiley
I think SolrCloud ought to make a conditional state change based on the ZooKeeper version of live_nodes. Thus a request to change the node's state would fail if the request included an old state version. In this case the client would re-fetch the state and retry or change its mind on whether it's

Re: Replicas stuck in DOWN state

2021-04-28 Thread Ilan Ginzburg
That's possible. It does break some tests but most important will likely not cover all cases (node up during the massive ZK update). Le mer. 28 avr. 2021 à 11:08, Jan Høydahl a écrit : > Could the Overseer do a simple live_nodes check before executing the > DOWNNODE message? If the node has a mo

Re: Replicas stuck in DOWN state

2021-04-28 Thread Jan Høydahl
Could the Overseer do a simple live_nodes check before executing the DOWNNODE message? If the node has a more recent entry in live_nodes than the DOWNODE msg then drop it? Not sure if this is at all possible? Jan > 28. apr. 2021 kl. 10:18 skrev Ilan Ginzburg : > > When a SolrCloud node goes do

Replicas stuck in DOWN state

2021-04-28 Thread Ilan Ginzburg
When a SolrCloud node goes down and back up in relatively rapid sequence (not unusual in Public Cloud environments), it appears possible that the DOWNNODE cluster state change message gets processed (or completes processing) after the node has restarted. This delayed execution will then mark repl