hgromer commented on PR #6370:
URL: https://github.com/apache/hbase/pull/6370#issuecomment-2448105733

   > Instead of targeting the old region partitions, which we know have changed 
(i.e., due to increased data volume), what if the restore process targets the 
region partitions of the most recent incremental backup -- G, H, I, J, K, L, 
from your example. I guess the splitting process is really challenging and slow?
   
   I thought it would make the most sense to target the splits taken at the 
time of the full backup. For a couple of reasons
   
   * theoretically, the full backups will be larger than the incremental 
backups, and therefore the splits are more representative of what makes sense 
for the dataset
   * splits are taken from the actual files. so the splits from the incremental 
snapshot will contain less of a holistic overview of the total region splits. 
imagine, in an extreme case, we only issue a single update. the incremental 
backup will have a single split that pertains to that update. 
   * there's no process to generate region splits for incremental backups. so 
we'd need to add an additional step here to do that
   
   > Whichever strategy we pursue, should the restore system use that flag to 
take a lock on the region boundaries before it starts rewriting data?
   
   Agree it makes sense to lock down region boundaries on the target table


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to