Hi, We've just recently gone through the process of upgrading Solr the 8.6 and have implemented an automated rolling update mechanism to allow us to more easily make changes to our cluster in the future.
Our process for this looks like this: 1. Cluster has 3 nodes. 2. Scale out to 6 nodes. 3. Protect the cluster overseer from scale in. 4. Scale in to 5 nodes. 5. Scale in to 4 nodes. 6. Expose the cluster overseer to scale in. 7. Scale in to 3 nodes. When scaling in, the nodes are removed by the oldest first. Whenever we scale in or out, we ensure that the cluster reaches a state where it has the required number of active nodes, and each node contains an active replica for each collection. It appears to work quite well. We were scaling down more than one node at a time previously, but we ran into this bug: https://issues.apache.org/jira/browse/SOLR-11208. Scaling down one at a time works around this for now. We were wondering if we should be taking more care around managing the leaders of our collections during this process. Should we move the collection leaders across to the new nodes that were created as part of step 2 before we start removing the old nodes? It looks like it's possible as Solr provides the ability to be able to do this by calling the REBALANCELEADERS api after setting preferedLeader=true on the replicas. Using this we could shift the leaders to the new nodes. A thought I had while looking at the APIs available to set the preferredLeader property was that the BALANCESHARDUNIQUE api would be perfect for this scenario if it had the ability to limit the nodes to a specific set. Otherwise our option is to do this balancing logic ourselves and call the ADDREPLICAPROP api. https://lucene.apache.org/solr/guide/8_6/cluster-node-management.html#balanceshardunique Cheers, Adam