[
https://issues.apache.org/jira/browse/HBASE-29197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hernan Gelaf-Romer reassigned HBASE-29197:
------------------------------------------
Assignee: Hernan Gelaf-Romer
> Deleting bulk loaded rows from the backup system table can result in large
> batch rejections failures
> ----------------------------------------------------------------------------------------------------
>
> Key: HBASE-29197
> URL: https://issues.apache.org/jira/browse/HBASE-29197
> Project: HBase
> Issue Type: Bug
> Components: backup&restore
> Reporter: Hernan Gelaf-Romer
> Assignee: Hernan Gelaf-Romer
> Priority: Major
>
> At my company, we're experimenting with the new incremental backup system.
> We've experienced issues deleting large number of bulkloaded rows from the
> system table if when exceeding the batch limit
> {quote}
> 2025-03-18 13:03:01.208 [htable-pool-6] WARN o.a.h.h.c.AsyncRequestFutureImpl
> - id=10, table=backup:system_bulk, attempt=15/13, failureCount=2048ops, last
> exception=java.io.IOException: java.io.IOException: Rejecting large batch
> operation for current batch with firstRegionName:
> backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. ,
> Requested Number of Rows: 2048 , Size Threshold: 1500
> ?? at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:511)??
> ?? at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)??
> ?? at
> org.apache.hadoop.hbase.ipc.CallRunnerWithContext.run(CallRunnerWithContext.java:103)??
> ?? at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:105)??
> ?? at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:85)??
> Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException:
> Rejecting large batch operation for current batch with firstRegionName:
> backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. ,
> Requested Number of Rows: 2048 , Size Threshold: 1500
> ?? at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.checkBatchSizeAndLogLargeSize(RSRpcServices.java:2721)??
> ?? at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2757)??
> ?? at
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:43520)??
> ?? at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443)??
> ?? ... 4 more??
> ?? on na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259,
> tracking started Tue Mar 18 13:01:12 UTC 2025; NOT retrying, failed=2048 –
> final attempt!??
> 2025-03-18 13:03:01.275 [pool-116-thread-1] ERROR
> o.a.h.h.b.impl.TableBackupClient - Unexpected BackupException : Failed 75776
> actions: IOException: 75776 times, servers with issues:
> na1-tart-soft-mountain.iad03.hubinternal.net,60020,1741890145177,
> na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
> 75776 actions: IOException: 75776 times, servers with issues:
> na1-tart-soft-mountain.iad03.hubinternal.net,60020,1741890145177,
> na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259
> ?? at
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.makeException(BufferedMutatorImpl.java:343)??
> ?? at
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:317)??
> ?? at
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:209)??
> ?? at
> org.apache.hadoop.hbase.backup.impl.BackupSystemTable.deleteBulkLoadedRows(BackupSystemTable.java:431)??
> ?? at
> org.apache.hadoop.hbase.backup.impl.BackupManager.deleteBulkLoadedRows(BackupManager.java:362)??
> ?? at
> org.apache.hadoop.hbase.backup.impl.FullTableBackupClient.execute(FullTableBackupClient.java:201)??
> ?? at
> org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:594)??
> ?? at
> com.hubspot.hbase.recovery.core.factories.HBaseBackupAdminFactory$HBaseBackupAdmin.backupTables(HBaseBackupAdminFactory.java:92)??
> ?? at
> com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.lambda$runTableBackup$2(BackupManager.java:524)??
> ?? at
> com.hubspot.hadoop.auth.utils.HadoopAuthHelper.lambda$doAs$9(HadoopAuthHelper.java:590)??
> ?? at
> java.base/java.security.AccessController.doPrivileged(AccessController.java:714)??
> ?? at java.base/javax.security.auth.Subject.doAs(Subject.java:525)??
> ?? at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)??
> ?? at
> com.hubspot.hadoop.auth.utils.HadoopAuthHelper.doAs(HadoopAuthHelper.java:603)??
> ?? at
> com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.runTableBackup(BackupManager.java:521)??
> ?? at
> com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.run(BackupManager.java:449)??
> ?? at
> com.hubspot.hbase.recovery.core.backup.BackupManager.runBackups(BackupManager.java:103)??
> ?? at
> com.hubspot.hbase.recovery.jobs.BackupJob.takeBackups(BackupJob.java:166)??
> ?? at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)??
> ?? at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)??
> ?? at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)??
> ?? at java.base/java.lang.Thread.run(Thread.java:1583)??
> ?? Suppressed:
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
> 6144 actions: IOException: 6144 times, servers with issues:
> na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259??
> ?? at
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.makeException(BufferedMutatorImpl.java:343)??
> ?? at
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:317)??
> ?? at
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:246)??
> ?? at
> org.apache.hadoop.hbase.backup.impl.BackupSystemTable.deleteBulkLoadedRows(BackupSystemTable.java:424)??
> {quote}
> We should split these deletes up into batches to avoid this failure
--
This message was sent by Atlassian Jira
(v8.20.10#820010)