I am trying to sort out what updating a relatively simple SolrCloud 4.1 deployment (one shard, 500 collections, 2 replicas each collection) looks like. From experience and from reading other accounts, just restarting both Solr instances is a coin toss - both instances get tied up trying to recover from one another and the solution is to try again as eventually it works.
The best way I've been able to safely and reliably get all nodes updated (configuration changes, possibly updated handlers) is to unload all cores to force leadership to the other node, update and restart the node, create the cores on that node, and then repeat the operation on the other node. I've seen other discussions on this, e.g. http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201302.mbox/%3CCAC8vSzZ1hrLapfF3bDKu-%3D4dZNUPNDGct22B-U3y1rK9BaU%2BxQ%40mail.gmail.com%3E So I'm wondering. Has this changed or improved dramatically since 4.1? Is there a way to do this without unloading/reloading the cores (this can be time-consuming)? Or is this pretty much how everyone does it? Two feature requests related to this - an API to cause a leadership change without unloading/reloading - doesn't seem to have gone anywhere. https://issues.apache.org/jira/browse/SOLR-4491 https://issues.apache.org/jira/browse/SOLR-4492