Daniel Roudnitsky created HBASE-29470: -----------------------------------------
Summary: Client swallows interrupts during location resolution Key: HBASE-29470 URL: https://issues.apache.org/jira/browse/HBASE-29470 Project: HBase Issue Type: Bug Components: Client Affects Versions: 2.5.12, 2.6.3 Reporter: Daniel Roudnitsky Assignee: Daniel Roudnitsky +Problem+ With batch requests with the 2.x sync client, the client will swallow interrupts that are sent during region location resolution. Sync client will sequentially resolve the region location of each action in a batch request, and if an interrupt signal is sent during this process, the client swallows the interrupt and considers it as a location error for whatever action location was being resolved at the time of the interrupt, and then the client will continue with location resolution for the remaining actions in the batch and will then execute the remaining actions. Once the client completes processing the rest of the batch request (however long that takes), it will ultimately throw a RetriesExhaustedWithDetailsException since we could not execute the action which was being processed at the time of the interrupt, but all the remaining actions in the batch will have been processed and returned a result. For example a batch call with 10 actions which is interrupted ~immediately after execution started will not return ~immediately on interrupt, will run for however long it takes to process the latter 9 actions, and will ultimately result in 1 interrupted exception and 9 successful results/actions. +Root cause and solution+ In locateRegionInMeta where the meta lookup happens [we rethrow InterruptedException as an IOException|https://github.com/apache/hbase/blob/d79070d14054195ab38644b3d5c9332073c47455/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L1124], and in [findAllLocationsOrFail we will treat any IOException as a location error|https://github.com/apache/hbase/blob/d79070d14054195ab38644b3d5c9332073c47455/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L555-L557], set the error for the action that was being processed, and then [groupAndSendMulti will proceed as usual|https://github.com/apache/hbase/blob/d79070d14054195ab38644b3d5c9332073c47455/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L460-L467] and continue to process the rest of the batch. We need special handling for interrupted exception in groupAndSendMulti to fast fail the entire batch with InterruptedIOException. -- This message was sent by Atlassian Jira (v8.20.10#820010)