Re: [PR] HBASE-27659: Incremental backups should re-use splits from last full backup [hbase]

via GitHub Thu, 17 Oct 2024 10:12:15 -0700


hgromer commented on code in PR #6370:
URL: https://github.com/apache/hbase/pull/6370#discussion_r1805131650



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalTableBackupClient.java:
##########
@@ -197,55 +202,56 @@ protected List<byte[]> handleBulkLoad(List<TableName> 
sTableList) throws IOExcep
           }
         }
       }
+      mergeSplitBulkloads(activeFiles, archiveFiles, srcTable);
+      incrementalCopyBulkloadHFiles(tgtFs, srcTable);
     }
-
-    copyBulkLoadedFiles(activeFiles, archiveFiles);
-
     return pair.getSecond();
   }
 
-  private void copyBulkLoadedFiles(List<String> activeFiles, List<String> 
archiveFiles)
-    throws IOException {
-    try {
-      // Enable special mode of BackupDistCp
-      conf.setInt(MapReduceBackupCopyJob.NUMBER_OF_LEVELS_TO_PRESERVE_KEY, 5);
-      // Copy active files
-      String tgtDest = backupInfo.getBackupRootDir() + Path.SEPARATOR + 
backupInfo.getBackupId();
-      int attempt = 1;
-      while (activeFiles.size() > 0) {
-        LOG.info("Copy " + activeFiles.size() + " active bulk loaded files. 
Attempt =" + attempt++);
-        String[] toCopy = new String[activeFiles.size()];
-        activeFiles.toArray(toCopy);
-        // Active file can be archived during copy operation,
-        // we need to handle this properly
-        try {
-          incrementalCopyHFiles(toCopy, tgtDest);
-          break;
-        } catch (IOException e) {
-          // Check if some files got archived
-          // Update active and archived lists
-          // When file is being moved from active to archive
-          // directory, the number of active files decreases
-          int numOfActive = activeFiles.size();
-          updateFileLists(activeFiles, archiveFiles);
-          if (activeFiles.size() < numOfActive) {
-            continue;
-          }
-          // if not - throw exception
-          throw e;
+  private void mergeSplitBulkloads(List<String> activeFiles, List<String> 
archiveFiles,

Review Comment:
   The process is a bit different for bulkloads now. Rather than directly 
copying them to the backup directory. We first run them through 
MapReduceHFileSplitterJob, which will re-split the HFiles based on the 
SnapshotRegionLocator. We then copy the output of that job to the backup 
directory



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] HBASE-27659: Incremental backups should re-use splits from last full backup [hbase]

Reply via email to