Hi all, My team at work maintains a SolrCloud 5.3.2 cluster with multiple collections configured with sharding and replication.
We recently backed up our Solr indexes using the built-in backup functionality. After the cluster was restored from the backup, we noticed that atomic updates of documents are failing occasionally with the error message 'missing required field [...]'. The exceptions are thrown on a host on which the document to be updated is not stored. From this we are deducing that there is a problem with finding the right host by the hash of the uniqueKey. Indeed, our investigations so far showed that for at least one collection in the new cluster, the shards have different hash ranges assigned now. We checked the hash ranges by querying /admin/collections?action=CLUSTERSTATUS. Find below the shard hash ranges of one collection that we debugged. Old cluster: shard1_0 80000000 - aaa9ffff shard1_1 aaaa0000 - d554ffff shard2_0 d5550000 - fffeffff shard2_1 ffff0000 - 2aa9ffff shard3_0 2aaa0000 - 5554ffff shard3_1 55550000 - 7fffffff New cluster: shard1 80000000 - aaa9ffff shard2 aaaa0000 - d554ffff shard3 d5550000 - ffffffff shard4 0 - 2aa9ffff shard5 2aaa0000 - 5554ffff shard6 55550000 - 7fffffff Note that the shard names differ because the old cluster's shards were split. As you can see, the ranges of shard3 and shard4 differ from the old cluster. This change of hash ranges matches with the symptoms we are currently experiencing. We found this JIRA ticket https://issues.apache.org/jira/browse/SOLR-5750 in which David Smiley comments: shard hash ranges aren't restored; this error could be disasterous It seems that this is what happened to us. We would like to hear some suggestions on how we could recover from this problem. Best, Gary