[ https://issues.apache.org/jira/browse/HBASE-28589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Charles Connell updated HBASE-28589: ------------------------------------ Fix Version/s: 3.0.0-beta-2 2.6.4 2.5.13 > Server side DoNotRetryException not propagated to client > -------------------------------------------------------- > > Key: HBASE-28589 > URL: https://issues.apache.org/jira/browse/HBASE-28589 > Project: HBase > Issue Type: Bug > Components: IPC/RPC > Affects Versions: 2.0.0, 2.4.0, 2.5.0, 2.6.0, 3.0.0 > Reporter: ZhenyuLi > Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0-beta-2, 2.6.4, 2.5.13 > > > When an IOException occurs during response creation in > ServerCall.setResponse(), the method only catches the IOException and logs a > warning and sets the response to null. This causes the client to receive no > response or experience connection issues without knowing what went wrong on > the server side. > An example of the current ServerCall.setResponse catching the exception is > the flaw in the fix in {-}HBASE-14598{-}. > The original fix for -HBASE-14598- addressed two aspects: > # When a Scan/Get RPC attempts to allocate an excessively large array that > could trigger an OutOfMemoryError (OOM), it checks the array size before > allocation and throws a BufferOverflowException to prevent OOM. > # The fix intended to stop client retries for such failures by throwing a > DoNotRetryException when a BufferOverflowException occurs, as retrying cannot > resolve the underlying issue. > *The Problem:* The DoNotRetryException is never propagated to the client > side. Here's the issue flow: > # ByteBufferOutputStream.checkSizeAndGrow() throws BufferOverflowException > # The exception propagates through the call stack: > ** ByteBufferOutputStream.checkSizeAndGrow() > ** encoder.write() > ** encodeCellsTo() (Catches BufferOverflowException and turns it into > DoNotRetryIOException) > ** this.cellBlockBuilder.buildCellBlockStream() > ** call.setResponse() > # The DoNotRetryException is ultimately caught in call.setResponse, where it > is merely logged but not sent back to the client > # As a result, the client continues retrying indefinitely since the response > is null and the Netty connection will be closed. > *Current Status:* In the latest branches (3.0 and 2.6), this issue still > exists. In ServerCall.java, when ALLOCATOR_POOL_ENABLED_KEY > (hbase.ipc.server.reservoir.enabled) is set to false, the setResponse() > method follows the same problematic path. If a DoNotRetryException is thrown > in {{{}ByteBuffer b = > this.cellBlockBuilder.buildCellBlock(this.connection.codec, > this.connection.compressionCodec, cells);{}}}, it gets swallowed in the > setResponse() catch block and never reaches the client. > *Steps to Reproduce:* > # Set up a 3-node HBase cluster with 3 RegionServers > # Set hbase.ipc.server.reservoir.enabled to false to use > ByteBufferOutputStream > # Inject a BufferOverflowException at > ByteBufferOutputStream.checkSizeAndGrow() to simulate an OOM condition > # Send a scan request > # Observe endless client retries > *Expected Behavior:* The DoNotRetryException should be properly propagated to > the client to prevent retry attempts. -- This message was sent by Atlassian Jira (v8.20.10#820010)