ankitsol commented on code in PR #7246:
URL: https://github.com/apache/hbase/pull/7246#discussion_r2331204207
##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalTableBackupClient.java:
##########
@@ -166,6 +167,26 @@ protected List<BulkLoad> handleBulkLoad(List<TableName>
tablesToBackup) throws I
Path tblDir = CommonFSUtils.getTableDir(rootdir, srcTable);
Path p = new Path(tblDir, regionName + Path.SEPARATOR + fam +
Path.SEPARATOR + filename);
+ // For continuous backup: bulkload files are copied from backup
directory defined by
+ // CONF_CONTINUOUS_BACKUP_WAL_DIR instead of source cluster.
+ String backupRootDir = conf.get(CONF_CONTINUOUS_BACKUP_WAL_DIR);
+ if (backupInfo.isContinuousBackupEnabled() &&
!Strings.isNullOrEmpty(backupRootDir)) {
+ String dayDirectoryName =
BackupUtils.formatToDateString(bulkLoad.getTimestamp());
+ Path bulkLoadBackupPath =
+ new Path(backupRootDir, BULKLOAD_FILES_DIR + Path.SEPARATOR +
dayDirectoryName);
+ Path bulkLoadDir = new Path(bulkLoadBackupPath,
+ srcTable.getNamespaceAsString() + Path.SEPARATOR +
srcTable.getNameAsString());
+ FileSystem backupFs = FileSystem.get(bulkLoadDir.toUri(), conf);
+ Path fullBulkLoadBackupPath =
+ new Path(bulkLoadDir, regionName + Path.SEPARATOR + fam +
Path.SEPARATOR + filename);
+ if (backupFs.exists(fullBulkLoadBackupPath)) {
+ LOG.debug("Backup bulkload file found {}", fullBulkLoadBackupPath);
+ p = fullBulkLoadBackupPath;
+ } else {
+ LOG.warn("Backup bulkload file not found {}",
fullBulkLoadBackupPath);
Review Comment:
In non-continuous incremental backup approach, bulkload files are copied
directly from the source cluster to the backup location.
In continuous backup approach, these files are instead copied from the
bulkload backup location. I’ve added a warning here because if the required
bulkload backup files are missing, they would be copied from the source cluster
(as a fallback mechanism).
Source cluster files are only deleted after a successful full or incremental
backup (in both non-continuous and continuous)
Ideally, with continuous backups, once a bulkload file has been backed up by
the replication endpoint, it could be safely deleted from the source cluster.
However, doing so would significantly complicate the checkpoint logic, since it
would then depend on both WAL flushes and bulkload backups.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]