On 9/25/2017 7:25 AM, Vikas Mehra wrote:
> Cluster has 1 zookeeper node and 3 solr nodes. There is only one collection
> with 3 shards. Data is continuously indexed using SolrJ API. System is
> running on AWS and I am taking backup on EFS (Elastic File System).
>
> Observed behavior:
> If indexing is not in progress, I take a backup of cluster using collection
> API, backup succeeds and restore works as expected.
>
> snapshotscli.sh works as expected if I first take snapshot of index while
> indexing is in progress and then take backup. There is no error during
> restore.

I was completely unaware of the snapshotcli.sh script.  Just found where
it was added to Solr:

https://issues.apache.org/jira/browse/SOLR-9688

> However, I get error most of the time if I try to restore collection from
> the backup taken using collection API when indexing was still in progress.
> Error is always missing segment and I can see that segment its trying to
> read during restore does not exist in the backup shard directory.

My best guess: When you manually create a snapshot, the BACKUP feature
in the Collections API finds that snapshot and backs it up.  When you
don't create a snapshot, perhaps it only copies from the live index,
which can change if there is indexing underway.

When there are no snapshots, I think that the BACKUP feature should
create one, then delete it once the backup is done.  Or it could use a
Lucene feature called a commit point to ensure that files cannot
disappear during the backup, and delete that when the backup is done.

I've always found it hard to decipher the code for the Collections API. 
I can never figure out exactly where the work is being done.  I've poked
around a bit but I cannot see where the BACKUP action is being actually
handled.  The code is very difficult to follow, so I have no idea
whether I've even found the right code to look at.

Thanks,
Shawn

Reply via email to