Re: [PR] HBASE-29003 Proper bulk load tracking [hbase]

via GitHub Sat, 05 Apr 2025 10:44:07 -0700


hgromer commented on code in PR #6506:
URL: https://github.com/apache/hbase/pull/6506#discussion_r2001071881



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/FullTableBackupClient.java:
##########
@@ -192,6 +198,9 @@ public void execute() throws IOException {
         
BackupUtils.getMinValue(BackupUtils.getRSLogTimestampMins(newTableSetTimestampMap));
       backupManager.writeBackupStartCode(newStartCode);
 
+      backupManager
+        
.deleteBulkLoadedRows(bulkLoadsToDelete.stream().map(BulkLoad::getRowKey).toList());

Review Comment:
   At my company, we have a similar patch applied to our fork, and we've run 
into issues with batch sizes that causes backup failures. This seems to happen 
when there are too many rows to delete, you end up with something like 
   
   ```
   Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: 
Rejecting large batch operation for current batch with firstRegionName: 
backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. , Requested 
Number of Rows: 2048 , Size Threshold: 1500
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.checkBatchSizeAndLogLargeSize(RSRpcServices.java:2721)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2757)
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:43520)
        at org.apache.ha
   ```
   
   It might be worth splitting up this call to delete if they are exceptionally 
large



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] HBASE-29003 Proper bulk load tracking [hbase]

Reply via email to