Re: [PR] HBASE-29776: Log filtering in IncrementalBackupManager can lead to data loss [hbase]

via GitHub Tue, 06 Jan 2026 10:41:56 -0800


DieterDP-ng commented on code in PR #7582:
URL: https://github.com/apache/hbase/pull/7582#discussion_r2658268536



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalBackupManager.java:
##########


Review Comment:
   Hi @hgromer, I agree HBASE-29776 is an issue (sorry for not responding 
sooner there, I was on vacation the past weeks), but I'm not yet convinced this 
is the right approach to fix it. It feels very complex to reason about, so I 
wonder if there isn't a simpler approach. Already wanted to give some 
intermediate feedback while I think a bit more about it.
   
   - Since newTimestamps never is pruned, the entry in the backup table will 
keep growing over time.
   - `newTimestamps` will end up being written in 
`BackupSystemTable#createPutForWriteRegionServerLogTimestamp`, but these change 
no longer match the javadoc of that method.



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalBackupManager.java:
##########
@@ -83,12 +82,48 @@ public Map<String, Long> getIncrBackupLogFileMap() throws 
IOException {
     LOG.info("Execute roll log procedure for incremental backup ...");
     BackupUtils.logRoll(conn, backupInfo.getBackupRootDir(), conf);
 
-    newTimestamps = readRegionServerLastLogRollResult();
+    Map<String, Long> newTimestamps = readRegionServerLastLogRollResult();

Review Comment:
   This method does an unnecessary scan, since you override all entries in the 
code you add below.



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalBackupManager.java:
##########


Review Comment:
   My suggestion is to simplify all changes in this file to (the fix for the 
excluded log files is also needed):
   
   ```
       LOG.info("Execute roll log procedure for incremental backup ...");
       long rollStartTs = EnvironmentEdgeManager.currentTime();
       BackupUtils.logRoll(conn, backupInfo.getBackupRootDir(), conf);
   
       Map<String, Long> rollTimestamps = readRegionServerLastLogRollResult();
   
       Map<String, Long> newTimestamps =
         rollTimestamps.entrySet().stream()
           // Region servers that are offline since the last backup will have 
old roll timestamps,
           // prune their information here, as it is not relevant to be stored 
or used for finding
           // the relevant logs.
           .filter(entry -> entry.getValue() > rollStartTs)
           .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
   
       // This method needs to be adjusted to use "rollStartTs" if an entry is 
not found in newTimestamps.
       // Or alternatively: getLogFilesForNewBackup(previousTimestampMins,
       //     DefaultedMap.defaultedMap(newTimestamps, rollStartTs), conf, 
savedStartCode);
       logList = getLogFilesForNewBackup(previousTimestampMins, newTimestamps, 
conf, savedStartCode);
   ```
   
   Then when finding which logs to include, these are the options:
   - server found in both previous and newTimestamps: a region server that is 
unchanged, include logs older than previous and newer than newTimestamps
   - server found in only previous: a region server that has gone offline, all 
logs will be older than rollStartTs and should be included
   - server found in only newTimestamps: a new region server, include all logs 
that are older than the corresponding newTimestamp
   - server found in neither: a region server that was started and went back 
offline in between the previous and current backup, all logs will be older than 
rollStartTs and should be included
   
   This approach will keep `newTimestamps` limited to the relevant entries. We 
could consider cleaning up the entries for `readRegionServerLastLogRollResult` 
as well, but left that out of scope for now.
   
   Similar code suffices in the `FullTableBackupClient`.



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalBackupManager.java:
##########
@@ -228,15 +263,6 @@ private List<String> getLogFilesForNewBackup(Map<String, 
Long> olderTimestamps,
       } else if (currentLogTS > oldTimeStamp) {
         resultLogFiles.add(currentLogFile);
       }
-
-      // It is possible that a host in .oldlogs is an obsolete region server

Review Comment:
   I think removing this block entirely is wrong. I believe the semantics of 
`newestTimestamps` is "ensure we have everything backed up to this timestamp". 
So if `currentLogTS > newTimestamp` is true, we should indeed skip this file.
   
   So I think this block should be kept, but adjusted to:
   ```
         if (newTimestamp != null && currentLogTS > newTimestamp) {
           newestLogs.add(currentLogFile);
         }
   ```
   
   I also think a similar issue is present for the `.logs` in this method.



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalBackupManager.java:
##########
@@ -83,12 +82,48 @@ public Map<String, Long> getIncrBackupLogFileMap() throws 
IOException {
     LOG.info("Execute roll log procedure for incremental backup ...");
     BackupUtils.logRoll(conn, backupInfo.getBackupRootDir(), conf);
 
-    newTimestamps = readRegionServerLastLogRollResult();
+    Map<String, Long> newTimestamps = readRegionServerLastLogRollResult();
+
+    Map<String, Long> latestLogRollByHost = 
readRegionServerLastLogRollResult();
+    for (Map.Entry<String, Long> entry : latestLogRollByHost.entrySet()) {
+      String host = entry.getKey();
+      long latestLogRoll = entry.getValue();
+      Long earliestTimestampToIncludeInBackup = 
previousTimestampMins.get(host);
+
+      boolean isInactive = earliestTimestampToIncludeInBackup != null
+        && earliestTimestampToIncludeInBackup > latestLogRoll;
+
+      long latestTimestampToIncludeInBackup;
+      if (isInactive) {
+        LOG.debug("Avoided resetting latest timestamp boundary for {} from {} 
to {}", host,
+          earliestTimestampToIncludeInBackup, latestLogRoll);
+        latestTimestampToIncludeInBackup = earliestTimestampToIncludeInBackup;
+      } else {
+        latestTimestampToIncludeInBackup = latestLogRoll;
+      }
+      newTimestamps.put(host, latestTimestampToIncludeInBackup);
+    }
 
     logList = getLogFilesForNewBackup(previousTimestampMins, newTimestamps, 
conf, savedStartCode);
     logList = excludeProcV2WALs(logList);
     backupInfo.setIncrBackupFileList(logList);
 
+    // Update boundaries based on WALs that will be backed up

Review Comment:
   For my understanding, is this code block an optimization, or a necessary fix 
for a specific case of appearing/disappearing region servers?



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalBackupManager.java:
##########
@@ -83,12 +82,48 @@ public Map<String, Long> getIncrBackupLogFileMap() throws 
IOException {
     LOG.info("Execute roll log procedure for incremental backup ...");
     BackupUtils.logRoll(conn, backupInfo.getBackupRootDir(), conf);
 
-    newTimestamps = readRegionServerLastLogRollResult();
+    Map<String, Long> newTimestamps = readRegionServerLastLogRollResult();
+
+    Map<String, Long> latestLogRollByHost = 
readRegionServerLastLogRollResult();
+    for (Map.Entry<String, Long> entry : latestLogRollByHost.entrySet()) {
+      String host = entry.getKey();
+      long latestLogRoll = entry.getValue();
+      Long earliestTimestampToIncludeInBackup = 
previousTimestampMins.get(host);
+
+      boolean isInactive = earliestTimestampToIncludeInBackup != null
+        && earliestTimestampToIncludeInBackup > latestLogRoll;
+
+      long latestTimestampToIncludeInBackup;
+      if (isInactive) {
+        LOG.debug("Avoided resetting latest timestamp boundary for {} from {} 
to {}", host,
+          earliestTimestampToIncludeInBackup, latestLogRoll);
+        latestTimestampToIncludeInBackup = earliestTimestampToIncludeInBackup;
+      } else {
+        latestTimestampToIncludeInBackup = latestLogRoll;
+      }
+      newTimestamps.put(host, latestTimestampToIncludeInBackup);
+    }
 
     logList = getLogFilesForNewBackup(previousTimestampMins, newTimestamps, 
conf, savedStartCode);
     logList = excludeProcV2WALs(logList);
     backupInfo.setIncrBackupFileList(logList);
 
+    // Update boundaries based on WALs that will be backed up
+    for (String logFile : logList) {
+      Path logPath = new Path(logFile);
+      String logHost = BackupUtils.parseHostFromOldLog(logPath);
+      if (logHost == null) {
+        logHost = BackupUtils.parseHostNameFromLogFile(logPath.getParent());

Review Comment:
   This method seems to support parsing old log names as well, is it possible 
to merge with the parsing 2 lines above? Though I am confused as to why the 
former uses `logPath` and the latter `logPath.getParent()`



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalBackupManager.java:
##########
@@ -83,12 +82,48 @@ public Map<String, Long> getIncrBackupLogFileMap() throws 
IOException {
     LOG.info("Execute roll log procedure for incremental backup ...");
     BackupUtils.logRoll(conn, backupInfo.getBackupRootDir(), conf);
 
-    newTimestamps = readRegionServerLastLogRollResult();
+    Map<String, Long> newTimestamps = readRegionServerLastLogRollResult();
+
+    Map<String, Long> latestLogRollByHost = 
readRegionServerLastLogRollResult();
+    for (Map.Entry<String, Long> entry : latestLogRollByHost.entrySet()) {
+      String host = entry.getKey();
+      long latestLogRoll = entry.getValue();
+      Long earliestTimestampToIncludeInBackup = 
previousTimestampMins.get(host);
+
+      boolean isInactive = earliestTimestampToIncludeInBackup != null
+        && earliestTimestampToIncludeInBackup > latestLogRoll;
+
+      long latestTimestampToIncludeInBackup;
+      if (isInactive) {
+        LOG.debug("Avoided resetting latest timestamp boundary for {} from {} 
to {}", host,
+          earliestTimestampToIncludeInBackup, latestLogRoll);
+        latestTimestampToIncludeInBackup = earliestTimestampToIncludeInBackup;
+      } else {
+        latestTimestampToIncludeInBackup = latestLogRoll;
+      }
+      newTimestamps.put(host, latestTimestampToIncludeInBackup);
+    }
 
     logList = getLogFilesForNewBackup(previousTimestampMins, newTimestamps, 
conf, savedStartCode);
     logList = excludeProcV2WALs(logList);
     backupInfo.setIncrBackupFileList(logList);
 
+    // Update boundaries based on WALs that will be backed up
+    for (String logFile : logList) {
+      Path logPath = new Path(logFile);
+      String logHost = BackupUtils.parseHostFromOldLog(logPath);
+      if (logHost == null) {
+        logHost = BackupUtils.parseHostNameFromLogFile(logPath.getParent());
+      }
+      if (logHost != null) {
+        long logTs = BackupUtils.getCreationTime(logPath);
+        Long latestTimestampToIncludeInBackup = newTimestamps.get(logHost);
+        if (latestTimestampToIncludeInBackup == null || logTs > 
latestTimestampToIncludeInBackup) {
+          LOG.info("Updating backup boundary for inactive host {}: 
timestamp={}", logHost, logTs);

Review Comment:
   Is this log line correct? Are we dealing with an inactive host if 
`latestTimestampToIncludeInBackup != null`?



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/FullTableBackupClient.java:
##########


Review Comment:
   Haven't gotten around to processing the changes in this file, but can you 
sketch why they are needed? Since your original ticket only discusses an issue 
with incremental backups.



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalBackupManager.java:
##########
@@ -228,15 +263,6 @@ private List<String> getLogFilesForNewBackup(Map<String, 
Long> olderTimestamps,
       } else if (currentLogTS > oldTimeStamp) {
         resultLogFiles.add(currentLogFile);
       }
-
-      // It is possible that a host in .oldlogs is an obsolete region server

Review Comment:
   From your comment in HBASE-29776:
   
   > newTimestamp represents the last time a backup rolled the WAL on the RS. 
If the RegionServer isn't running and therefore isn't able to roll the WAL 
again, then this timestamp will be in the past, and we end up filtering out all 
WAL files that were updated since then. Why would we filter out oldWALs that 
have been created since then? That seems wrong as well
   
   Your comment is correct, but I think the better fix is to ensure the 
newTimestamps are correctly updated (as you do in your other changes). Removing 
this block would result in too many logs being included in the backup.



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalBackupManager.java:
##########
@@ -83,12 +82,48 @@ public Map<String, Long> getIncrBackupLogFileMap() throws 
IOException {
     LOG.info("Execute roll log procedure for incremental backup ...");
     BackupUtils.logRoll(conn, backupInfo.getBackupRootDir(), conf);
 
-    newTimestamps = readRegionServerLastLogRollResult();
+    Map<String, Long> newTimestamps = readRegionServerLastLogRollResult();
+
+    Map<String, Long> latestLogRollByHost = 
readRegionServerLastLogRollResult();
+    for (Map.Entry<String, Long> entry : latestLogRollByHost.entrySet()) {
+      String host = entry.getKey();
+      long latestLogRoll = entry.getValue();
+      Long earliestTimestampToIncludeInBackup = 
previousTimestampMins.get(host);
+
+      boolean isInactive = earliestTimestampToIncludeInBackup != null
+        && earliestTimestampToIncludeInBackup > latestLogRoll;
+
+      long latestTimestampToIncludeInBackup;
+      if (isInactive) {
+        LOG.debug("Avoided resetting latest timestamp boundary for {} from {} 
to {}", host,
+          earliestTimestampToIncludeInBackup, latestLogRoll);
+        latestTimestampToIncludeInBackup = earliestTimestampToIncludeInBackup;
+      } else {
+        latestTimestampToIncludeInBackup = latestLogRoll;
+      }
+      newTimestamps.put(host, latestTimestampToIncludeInBackup);
+    }
 
     logList = getLogFilesForNewBackup(previousTimestampMins, newTimestamps, 
conf, savedStartCode);
     logList = excludeProcV2WALs(logList);
     backupInfo.setIncrBackupFileList(logList);
 
+    // Update boundaries based on WALs that will be backed up

Review Comment:
   Figured it out, it is to update the newTimestamps entries for regionservers 
that have since gone offline, but for which the logs are now backed up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] HBASE-29776: Log filtering in IncrementalBackupManager can lead to data loss [hbase]

Reply via email to