Re: [PR] [HBASE-29520] Utilize Backed-up Bulkloaded Files in Incremental Backup [hbase]

via GitHub Mon, 08 Sep 2025 12:46:27 -0700


ankitsol commented on code in PR #7246:
URL: https://github.com/apache/hbase/pull/7246#discussion_r2331204207



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalTableBackupClient.java:
##########
@@ -166,6 +167,26 @@ protected List<BulkLoad> handleBulkLoad(List<TableName> 
tablesToBackup) throws I
       Path tblDir = CommonFSUtils.getTableDir(rootdir, srcTable);
       Path p = new Path(tblDir, regionName + Path.SEPARATOR + fam + 
Path.SEPARATOR + filename);
 
+      // For continuous backup: bulkload files are copied from backup 
directory defined by
+      // CONF_CONTINUOUS_BACKUP_WAL_DIR instead of source cluster.
+      String backupRootDir = conf.get(CONF_CONTINUOUS_BACKUP_WAL_DIR);
+      if (backupInfo.isContinuousBackupEnabled() && 
!Strings.isNullOrEmpty(backupRootDir)) {
+        String dayDirectoryName = 
BackupUtils.formatToDateString(bulkLoad.getTimestamp());
+        Path bulkLoadBackupPath =
+          new Path(backupRootDir, BULKLOAD_FILES_DIR + Path.SEPARATOR + 
dayDirectoryName);
+        Path bulkLoadDir = new Path(bulkLoadBackupPath,
+          srcTable.getNamespaceAsString() + Path.SEPARATOR + 
srcTable.getNameAsString());
+        FileSystem backupFs = FileSystem.get(bulkLoadDir.toUri(), conf);
+        Path fullBulkLoadBackupPath =
+          new Path(bulkLoadDir, regionName + Path.SEPARATOR + fam + 
Path.SEPARATOR + filename);
+        if (backupFs.exists(fullBulkLoadBackupPath)) {
+          LOG.debug("Backup bulkload file found {}", fullBulkLoadBackupPath);
+          p = fullBulkLoadBackupPath;
+        } else {
+          LOG.warn("Backup bulkload file not found {}", 
fullBulkLoadBackupPath);

Review Comment:
   In non-continuous incremental backup approach, bulkload files are copied 
directly from the source cluster to the backup location.
   
   In continuous backup approach, these files are instead copied from the 
bulkload backup location. I’ve added a warning here because if the required 
bulkload backup files are missing, they would be copied from the source cluster 
(as a fallback mechanism).
   
   Source cluster files are only deleted after a successful full or incremental 
backup (in both non-continuous and continuous)
   
   Ideally, with continuous backups, once a bulkload file has been backed up by 
the replication endpoint, it could be safely deleted from the source cluster. 
However, doing so would significantly complicate the checkpoint logic, since it 
would then depend on both WAL flushes and bulkload backups.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HBASE-29520] Utilize Backed-up Bulkloaded Files in Incremental Backup [hbase]

Reply via email to