[ 
https://issues.apache.org/jira/browse/HBASE-28589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Connell updated HBASE-28589:
------------------------------------
    Fix Version/s: 3.0.0-beta-2
                   2.6.4
                   2.5.13

> Server side DoNotRetryException not propagated to client
> --------------------------------------------------------
>
>                 Key: HBASE-28589
>                 URL: https://issues.apache.org/jira/browse/HBASE-28589
>             Project: HBase
>          Issue Type: Bug
>          Components: IPC/RPC
>    Affects Versions: 2.0.0, 2.4.0, 2.5.0, 2.6.0, 3.0.0
>            Reporter: ZhenyuLi
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 3.0.0-beta-2, 2.6.4, 2.5.13
>
>
> When an IOException occurs during response creation in 
> ServerCall.setResponse(), the method only catches the IOException and logs a 
> warning and sets the response to null. This causes the client to receive no 
> response or experience connection issues without knowing what went wrong on 
> the server side.
> An example of the current ServerCall.setResponse catching the exception is 
> the flaw in the fix in {-}HBASE-14598{-}.
> The original fix for -HBASE-14598- addressed two aspects:
>  # When a Scan/Get RPC attempts to allocate an excessively large array that 
> could trigger an OutOfMemoryError (OOM), it checks the array size before 
> allocation and throws a BufferOverflowException to prevent OOM.
>  # The fix intended to stop client retries for such failures by throwing a 
> DoNotRetryException when a BufferOverflowException occurs, as retrying cannot 
> resolve the underlying issue.
> *The Problem:* The DoNotRetryException is never propagated to the client 
> side. Here's the issue flow:
>  # ByteBufferOutputStream.checkSizeAndGrow() throws BufferOverflowException
>  # The exception propagates through the call stack:
>  ** ByteBufferOutputStream.checkSizeAndGrow()
>  ** encoder.write()
>  ** encodeCellsTo() (Catches BufferOverflowException and turns it into 
> DoNotRetryIOException)
>  ** this.cellBlockBuilder.buildCellBlockStream()
>  ** call.setResponse()
>  # The DoNotRetryException is ultimately caught in call.setResponse, where it 
> is merely logged but not sent back to the client
>  # As a result, the client continues retrying indefinitely since the response 
> is null and the Netty connection will be closed.
> *Current Status:* In the latest branches (3.0 and 2.6), this issue still 
> exists. In ServerCall.java, when ALLOCATOR_POOL_ENABLED_KEY 
> (hbase.ipc.server.reservoir.enabled) is set to false, the setResponse() 
> method follows the same problematic path. If a DoNotRetryException is thrown 
> in {{{}ByteBuffer b = 
> this.cellBlockBuilder.buildCellBlock(this.connection.codec, 
> this.connection.compressionCodec, cells);{}}}, it gets swallowed in the 
> setResponse() catch block and never reaches the client.
> *Steps to Reproduce:*
>  # Set up a 3-node HBase cluster with 3 RegionServers
>  # Set hbase.ipc.server.reservoir.enabled to false to use 
> ByteBufferOutputStream
>  # Inject a BufferOverflowException at 
> ByteBufferOutputStream.checkSizeAndGrow() to simulate an OOM condition
>  # Send a scan request
>  # Observe endless client retries
> *Expected Behavior:* The DoNotRetryException should be properly propagated to 
> the client to prevent retry attempts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to