[ https://issues.apache.org/jira/browse/HBASE-28589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ZhenyuLi updated HBASE-28589: ----------------------------- Summary: Server side DoNotRetryException not propagated to client (was: DoNotRetryException not propagated to client) > Server side DoNotRetryException not propagated to client > -------------------------------------------------------- > > Key: HBASE-28589 > URL: https://issues.apache.org/jira/browse/HBASE-28589 > Project: HBase > Issue Type: Bug > Components: IPC/RPC > Affects Versions: 2.0.0, 2.4.0, 2.5.0, 2.6.0, 3.0.0 > Reporter: ZhenyuLi > Priority: Major > > I have discovered that the fix for HBASE-14598 does not completely resolve > the issue, and the problem persists in the latest branches (3.0 and 2.6). > The original fix for HBASE-14598 addressed two aspects: > # When a Scan/Get RPC attempts to allocate an excessively large array that > could trigger an OutOfMemoryError (OOM), it checks the array size before > allocation and throws a {{BufferOverflowException}} to prevent OOM. > # The fix intended to stop client retries for such failures by throwing a > {{DoNotRetryException}} when a {{BufferOverflowException}} occurs, as > retrying cannot resolve the underlying issue. > *The Problem:* The {{DoNotRetryException}} is never propagated to the client > side. Here's the issue flow: > # {{ByteBufferOutputStream.checkSizeAndGrow()}} throws > {{BufferOverflowException}} > # The exception propagates through the call stack: > ** {{ByteBufferOutputStream.checkSizeAndGrow()}} > ** {{encoder.write()}} > ** {{encodeCellsTo() (Catch BufferOverflowException and turn it into > DoNotRetryIOException)}} > ** {{this.cellBlockBuilder.buildCellBlockStream()}} > ** {{call.setResponse()}} > # The {{DoNotRetryException}} is ultimately caught in call.setResponse, > where it is merely logged but not sent back to the client > # As a result, the client continues retrying indefinitely since the response > is null and netty connection will be closed. > *Current Status:* In the latest branches (3.0 and 2.6), this issue still > exists. In {{{}ServerCall.java{}}}, when {{ALLOCATOR_POOL_ENABLED_KEY}} > ({{{}hbase.ipc.server.reservoir.enabled{}}}) is set to {{{}false{}}}, the > {{setResponse()}} method follows the same problematic path. If a > {{DoNotRetryException}} is thrown in the > ByteBuffer b = this.cellBlockBuilder.buildCellBlock(this.connection.codec, > this.connection.compressionCodec, cells); it gets swallowed in the > {{setResponse()}} catch block and never reaches the client. > *Steps to Reproduce:* > # Set up a 3-node HBase cluster with 3 RegionServers > # Set {{hbase.ipc.server.reservoir.enabled}} to {{false to use > ByteBufferOutputStream}} > # Inject a {{BufferOverflowException}} at > {{ByteBufferOutputStream.checkSizeAndGrow()}} to simulate an OOM condition > # Send a scan request > # Observe endless client retries > *Expected Behavior:* The {{DoNotRetryException}} should be properly > propagated to the client to prevent retry attempts. -- This message was sent by Atlassian Jira (v8.20.10#820010)