Hello, I'm investigating an 8 nodes Solr 7.2.1 cluster because we've a lot of problems, like when a node fails to import from a DB (maybe it freeze), the entire cluster goes down, and other like the leader wont change even when is down (all nodes detects that is down but no leader election is triggered), and similar problems. Every few days we've to recover the cluster because becomes inestable and goes down.
The last problem that I've got, is three collections that have nodes on "recovery" state from a lot of hours, and the log shows an error telling that "leader node is not the leader" so I'm trying to change the leader. After shutting down the "leader" (detected by the other nodes as down and waiting about 20 minutes), trying REBALANCELEADER and FORCELEADER, I'm unable to change the leader on the cluster, and that's why started to see on ZooKeeper. The problem I've seen on Zookeeper is that Leaders are different than Solr admin cluster info, so Maybe that's why the nodes are unable to connect to real leader and cannot end the recovery. The entire cluster and ZK has the traffic open to avoid problems (the VPC is private), so is not a connection problem. Is there any way to sync the leader info between solr and ZK?, also I want to know if exists a way to force to change the leader (FORCELEADER don't work when the solr denies to change the leader, because it say that a leader exists). Thanks! -- _________________________________________ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com _________________________________________