Hi,

We've just recently gone through the process of upgrading Solr the 8.6 and
have implemented an automated rolling update mechanism to allow us to more
easily make changes to our cluster in the future.

Our process for this looks like this:
1. Cluster has 3 nodes.
2. Scale out to 6 nodes.
3. Protect the cluster overseer from scale in.
4. Scale in to 5 nodes.
5. Scale in to 4 nodes.
6. Expose the cluster overseer to scale in.
7. Scale in to 3 nodes.

When scaling in, the nodes are removed by the oldest first. Whenever we
scale in or out, we ensure that the cluster reaches a state where it has
the required number of active nodes, and each node contains an active
replica for each collection.

It appears to work quite well. We were scaling down more than one node at a
time previously, but we ran into this bug:
https://issues.apache.org/jira/browse/SOLR-11208. Scaling down one at a
time works around this for now.

We were wondering if we should be taking more care around managing the
leaders of our collections during this process. Should we move the
collection leaders across to the new nodes that were created as part of
step 2 before we start removing the old nodes?

It looks like it's possible as Solr provides the ability to be able to do
this by calling the REBALANCELEADERS api after setting preferedLeader=true
on the replicas. Using this we could shift the leaders to the new nodes.

A thought I had while looking at the APIs available to set the
preferredLeader property was that the BALANCESHARDUNIQUE api would be
perfect for this scenario if it had the ability to limit the nodes to a
specific set. Otherwise our option is to do this balancing logic ourselves
and call the ADDREPLICAPROP api.

https://lucene.apache.org/solr/guide/8_6/cluster-node-management.html#balanceshardunique

Cheers,
Adam

Reply via email to