In essence, no. The data is, at best, in the wrong shard and at worst nowhere.
Sent from my phone On Jun 16, 2016 8:26 AM, "Gary Yao" <gary....@zalando.de> wrote: > Hi Erick, > > I should add that our Solr cluster is in production and new documents > are constantly indexed. The new cluster has been up for three weeks now. > The problem was discovered only now because in our use case Atomic > Updates and RealTime Gets are mostly performed on new documents. With > almost absolute certainty there are already documents in the index that > were distributed to the shards according to the new hash ranges. If we > just changed the hash ranges in ZooKeeper, the index would still be in > an inconsistent state. > > Is there any way to recover from this without having to re-index all > documents? > > Best, > Gary > > 2016-06-15 19:23 GMT+02:00 Erick Erickson <erickerick...@gmail.com>: > > Simplest, though a bit risky is to manually edit the znode and > > correct the znode entry. There are various tools out there, including > > one that ships with Zookeeper (see the ZK documentation). > > > > Or you can use the zkcli scripts (the Zookeeper ones) to get the znode > > down to your local machine, edit it there and then push it back up to ZK. > > > > I'd do all this with my Solr nodes shut down, then insure that my ZK > > ensemble was consistent after the update etc.... > > > > Best, > > Erick > > > > On Wed, Jun 15, 2016 at 8:36 AM, Gary Yao <gary....@zalando.de> wrote: > >> Hi all, > >> > >> My team at work maintains a SolrCloud 5.3.2 cluster with multiple > >> collections configured with sharding and replication. > >> > >> We recently backed up our Solr indexes using the built-in backup > >> functionality. After the cluster was restored from the backup, we > >> noticed that atomic updates of documents are failing occasionally with > >> the error message 'missing required field [...]'. The exceptions are > >> thrown on a host on which the document to be updated is not stored. From > >> this we are deducing that there is a problem with finding the right host > >> by the hash of the uniqueKey. Indeed, our investigations so far showed > >> that for at least one collection in the new cluster, the shards have > >> different hash ranges assigned now. We checked the hash ranges by > >> querying /admin/collections?action=CLUSTERSTATUS. Find below the shard > >> hash ranges of one collection that we debugged. > >> > >> Old cluster: > >> shard1_0 80000000 - aaa9ffff > >> shard1_1 aaaa0000 - d554ffff > >> shard2_0 d5550000 - fffeffff > >> shard2_1 ffff0000 - 2aa9ffff > >> shard3_0 2aaa0000 - 5554ffff > >> shard3_1 55550000 - 7fffffff > >> > >> New cluster: > >> shard1 80000000 - aaa9ffff > >> shard2 aaaa0000 - d554ffff > >> shard3 d5550000 - ffffffff > >> shard4 0 - 2aa9ffff > >> shard5 2aaa0000 - 5554ffff > >> shard6 55550000 - 7fffffff > >> > >> Note that the shard names differ because the old cluster's shards were > >> split. > >> > >> As you can see, the ranges of shard3 and shard4 differ from the old > >> cluster. This change of hash ranges matches with the symptoms we are > >> currently experiencing. > >> > >> We found this JIRA ticket > https://issues.apache.org/jira/browse/SOLR-5750 > >> in which David Smiley comments: > >> > >> shard hash ranges aren't restored; this error could be disasterous > >> > >> It seems that this is what happened to us. We would like to hear some > >> suggestions on how we could recover from this problem. > >> > >> Best, > >> Gary >