hgromer commented on PR #6370: URL: https://github.com/apache/hbase/pull/6370#issuecomment-2448105733
> Instead of targeting the old region partitions, which we know have changed (i.e., due to increased data volume), what if the restore process targets the region partitions of the most recent incremental backup -- G, H, I, J, K, L, from your example. I guess the splitting process is really challenging and slow? I thought it would make the most sense to target the splits taken at the time of the full backup. For a couple of reasons * theoretically, the full backups will be larger than the incremental backups, and therefore the splits are more representative of what makes sense for the dataset * splits are taken from the actual files. so the splits from the incremental snapshot will contain less of a holistic overview of the total region splits. imagine, in an extreme case, we only issue a single update. the incremental backup will have a single split that pertains to that update. * there's no process to generate region splits for incremental backups. so we'd need to add an additional step here to do that > Whichever strategy we pursue, should the restore system use that flag to take a lock on the region boundaries before it starts rewriting data? Agree it makes sense to lock down region boundaries on the target table -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org