hgromer commented on code in PR #5408:
URL: https://github.com/apache/hbase/pull/5408#discussion_r2362698949


##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/util/BackupUtils.java:
##########
@@ -770,4 +773,52 @@ public static String findMostRecentBackupId(String[] 
backupIds) {
     return BackupRestoreConstants.BACKUPID_PREFIX + recentTimestamp;
   }
 
+  /**
+   * roll WAL writer for all region servers and record the newest log roll 
result
+   */
+  public static void logRoll(Connection conn, String backupRootDir, 
Configuration conf)
+    throws IOException {
+    boolean legacy = conf.getBoolean("hbase.backup.logroll.legacy.used", 
false);
+    if (legacy) {
+      logRollV1(conn, backupRootDir);
+    } else {
+      logRollV2(conn, backupRootDir);
+    }
+  }
+
+  private static void logRollV1(Connection conn, String backupRootDir) throws 
IOException {
+    try (Admin admin = conn.getAdmin()) {
+      
admin.execProcedure(LogRollMasterProcedureManager.ROLLLOG_PROCEDURE_SIGNATURE,
+        LogRollMasterProcedureManager.ROLLLOG_PROCEDURE_NAME,
+        ImmutableMap.of("backupRoot", backupRootDir));
+    }
+  }
+
+  private static void logRollV2(Connection conn, String backupRootDir) throws 
IOException {

Review Comment:
   > and I don't think any additional functionality should be introduced or 
existing functionality should be reduced.
   
   I don't necessarily agree here, though I do agree v1 isn't doing this 
cleanup either. To clarify, this PR doesn't introduce any issues, but thanks to 
your changes I think we have a good way to solve the issue I've presented. This 
is a beta feature, and I feel that we can iterate on the behavior of the 
system, esepcially if it means we're improving the system's efficiency by 
reducing storage overhead.
   
   1. Yes; the oldWALs can take up a non-trivial amount of space and should be 
cleaned up when they are no longer necessary. We're seeing cases where we are 
storing terabytes of unused, deletable data, which is expensive 
   2. I'd like to talk this out, and make sure my logic makes sense. There are 
two types of backups. For full backups, it makes sense that we don't lose any 
data. We roll the WAL files, and then take a snapshot of all the HFiles on the 
cluster, so those WAL files are backed up. For incremental backups, we roll the 
WAL files then backup all WAL files from [<old_start_code>, newTimestamps). For 
both cases, I do not think there's any possibility of a data loss. I think as 
long as we delete entries from the system table after the backup completes, we 
should be okay
   3. It'd be nice, but I think it'd be a lot harder b/c logRollV1 never 
updates the backup system table with the newer timestamps as far as I can tell. 
Given this feature is still in beta, I'm happy to move forward with the v2 
functionality and mark v1 as deprecated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to