Re: [SolrCloud] shard hash ranges changed after restoring backup

Erick Erickson Thu, 16 Jun 2016 16:46:55 -0700

In essence, no. The data is, at best, in the wrong shard and at worst
nowhere.


Sent from my phone
On Jun 16, 2016 8:26 AM, "Gary Yao" <gary....@zalando.de> wrote:

> Hi Erick,
>
> I should add that our Solr cluster is in production and new documents
> are constantly indexed. The new cluster has been up for three weeks now.
> The problem was discovered only now because in our use case Atomic
> Updates and RealTime Gets are mostly performed on new documents. With
> almost absolute certainty there are already documents in the index that
> were distributed to the shards according to the new hash ranges. If we
> just changed the hash ranges in ZooKeeper, the index would still be in
> an inconsistent state.
>
> Is there any way to recover from this without having to re-index all
> documents?
>
> Best,
> Gary
>
> 2016-06-15 19:23 GMT+02:00 Erick Erickson <erickerick...@gmail.com>:
> > Simplest, though a bit risky is to manually edit the znode and
> > correct the znode entry. There are various tools out there, including
> > one that ships with Zookeeper (see the ZK documentation).
> >
> > Or you can use the zkcli scripts (the Zookeeper ones) to get the znode
> > down to your local machine, edit it there and then push it back up to ZK.
> >
> > I'd do all this with my Solr nodes shut down, then insure that my ZK
> > ensemble was consistent after the update etc....
> >
> > Best,
> > Erick
> >
> > On Wed, Jun 15, 2016 at 8:36 AM, Gary Yao <gary....@zalando.de> wrote:
> >> Hi all,
> >>
> >> My team at work maintains a SolrCloud 5.3.2 cluster with multiple
> >> collections configured with sharding and replication.
> >>
> >> We recently backed up our Solr indexes using the built-in backup
> >> functionality. After the cluster was restored from the backup, we
> >> noticed that atomic updates of documents are failing occasionally with
> >> the error message 'missing required field [...]'. The exceptions are
> >> thrown on a host on which the document to be updated is not stored. From
> >> this we are deducing that there is a problem with finding the right host
> >> by the hash of the uniqueKey. Indeed, our investigations so far showed
> >> that for at least one collection in the new cluster, the shards have
> >> different hash ranges assigned now. We checked the hash ranges by
> >> querying /admin/collections?action=CLUSTERSTATUS. Find below the shard
> >> hash ranges of one collection that we debugged.
> >>
> >>   Old cluster:
> >>     shard1_0 80000000 - aaa9ffff
> >>     shard1_1 aaaa0000 - d554ffff
> >>     shard2_0 d5550000 - fffeffff
> >>     shard2_1 ffff0000 - 2aa9ffff
> >>     shard3_0 2aaa0000 - 5554ffff
> >>     shard3_1 55550000 - 7fffffff
> >>
> >>   New cluster:
> >>     shard1 80000000 - aaa9ffff
> >>     shard2 aaaa0000 - d554ffff
> >>     shard3 d5550000 - ffffffff
> >>     shard4 0 - 2aa9ffff
> >>     shard5 2aaa0000 - 5554ffff
> >>     shard6 55550000 - 7fffffff
> >>
> >>   Note that the shard names differ because the old cluster's shards were
> >>   split.
> >>
> >> As you can see, the ranges of shard3 and shard4 differ from the old
> >> cluster. This change of hash ranges matches with the symptoms we are
> >> currently experiencing.
> >>
> >> We found this JIRA ticket
> https://issues.apache.org/jira/browse/SOLR-5750
> >> in which David Smiley comments:
> >>
> >>   shard hash ranges aren't restored; this error could be disasterous
> >>
> >> It seems that this is what happened to us. We would like to hear some
> >> suggestions on how we could recover from this problem.
> >>
> >> Best,
> >> Gary
>

Re: [SolrCloud] shard hash ranges changed after restoring backup

Reply via email to