DieterDP-ng commented on code in PR #7523:
URL: https://github.com/apache/hbase/pull/7523#discussion_r2593823522


##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/master/BackupLogCleaner.java:
##########


Review Comment:
   Method `serverToPreservationBoundaryTs` has become unused, so it should be 
removed.



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/master/BackupLogCleaner.java:
##########


Review Comment:
   You've simplified the logic to determine what can be deleted, but I can see 
this backfiring in the following case.
   Imagine a cluster with 2 backup roots (though multi-root doesn't work 
properly yet, assume it does). Root A has a backup of a single table, and is 
effectively not used. Root B is actually used and gets a new backup every day.
   Because the timestamp of A will remain unchanged, it will prohibit the 
deletion of regions that are only covered by B.
   
   Hence, I'd prefer the old logic path (with a fix for the case of missing 
timestamps).



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/master/BackupLogCleaner.java:
##########
@@ -248,6 +241,7 @@ protected static boolean canDeleteFile(Map<Address, Long> 
addressToBoundaryTs, P
   private static boolean isHMasterWAL(Path path) {
     String fn = path.getName();
     return fn.startsWith(WALProcedureStore.LOG_PREFIX)
-      || fn.endsWith(MasterRegionFactory.ARCHIVED_WAL_SUFFIX);
+      || fn.endsWith(MasterRegionFactory.ARCHIVED_WAL_SUFFIX)
+      || path.toString().contains(MasterRegionFactory.MASTER_STORE_DIR);

Review Comment:
   Do you know the format of the `Path`s entering here? Is there a chance that 
it contains a table name, in which case a table called "MyMasterData" would be 
classified as a master WAL.
   Might be safer to add path boundaries: `.contains("/" + 
MasterRegionFactory.MASTER_STORE_DIR + "/")`



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/master/BackupLogCleaner.java:
##########
@@ -248,6 +241,7 @@ protected static boolean canDeleteFile(Map<Address, Long> 
addressToBoundaryTs, P
   private static boolean isHMasterWAL(Path path) {
     String fn = path.getName();
     return fn.startsWith(WALProcedureStore.LOG_PREFIX)
-      || fn.endsWith(MasterRegionFactory.ARCHIVED_WAL_SUFFIX);
+      || fn.endsWith(MasterRegionFactory.ARCHIVED_WAL_SUFFIX)
+      || path.toString().contains(MasterRegionFactory.MASTER_STORE_DIR);

Review Comment:
   Add a test for this.



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/master/BackupLogCleaner.java:
##########


Review Comment:
   Another interesting case is when I were to take a backup of a single table, 
this shouldn't affect the cleanup of regions of tables not covered by the 
backup.
   
   Do we have a way to know whether a file offered to the cleaner is part of a 
table covered by backups? The format of the logs and the link to regions/tables 
is still spotty to me.



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/master/BackupLogCleaner.java:
##########
@@ -153,19 +159,34 @@ public Iterable<FileStatus> 
getDeletableFiles(Iterable<FileStatus> files) {
       return files;
     }
 
-    Map<Address, Long> serverToPreservationBoundaryTs;
+    long oldestStartCode = Long.MAX_VALUE;
     try {
-      try (BackupManager backupManager = new BackupManager(conn, getConf())) {
-        serverToPreservationBoundaryTs =
-          serverToPreservationBoundaryTs(backupManager.getBackupHistory(true));
+      try (BackupSystemTable sysTable = new BackupSystemTable(conn)) {
+        Set<String> roots = sysTable.getBackupHistory(true).stream()
+          .map(BackupInfo::getBackupRootDir).collect(Collectors.toSet());
+        if (roots.isEmpty()) {
+          LOG.info("No backups found, can delete all files");

Review Comment:
   This will spam logs, suggest to change to debug logging.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to