Hi all,

My team at work maintains a SolrCloud 5.3.2 cluster with multiple
collections configured with sharding and replication.

We recently backed up our Solr indexes using the built-in backup
functionality. After the cluster was restored from the backup, we
noticed that atomic updates of documents are failing occasionally with
the error message 'missing required field [...]'. The exceptions are
thrown on a host on which the document to be updated is not stored. From
this we are deducing that there is a problem with finding the right host
by the hash of the uniqueKey. Indeed, our investigations so far showed
that for at least one collection in the new cluster, the shards have
different hash ranges assigned now. We checked the hash ranges by
querying /admin/collections?action=CLUSTERSTATUS. Find below the shard
hash ranges of one collection that we debugged.

  Old cluster:
    shard1_0 80000000 - aaa9ffff
    shard1_1 aaaa0000 - d554ffff
    shard2_0 d5550000 - fffeffff
    shard2_1 ffff0000 - 2aa9ffff
    shard3_0 2aaa0000 - 5554ffff
    shard3_1 55550000 - 7fffffff

  New cluster:
    shard1 80000000 - aaa9ffff
    shard2 aaaa0000 - d554ffff
    shard3 d5550000 - ffffffff
    shard4 0 - 2aa9ffff
    shard5 2aaa0000 - 5554ffff
    shard6 55550000 - 7fffffff

  Note that the shard names differ because the old cluster's shards were
  split.

As you can see, the ranges of shard3 and shard4 differ from the old
cluster. This change of hash ranges matches with the symptoms we are
currently experiencing.

We found this JIRA ticket https://issues.apache.org/jira/browse/SOLR-5750
in which David Smiley comments:

  shard hash ranges aren't restored; this error could be disasterous

It seems that this is what happened to us. We would like to hear some
suggestions on how we could recover from this problem.

Best,
Gary

Reply via email to