[jira] [Updated] (HBASE-28589) Server side DoNotRetryException not propagated to client

ZhenyuLi (Jira) Tue, 15 Jul 2025 17:12:32 -0700


     [ 
https://issues.apache.org/jira/browse/HBASE-28589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ZhenyuLi updated HBASE-28589:
-----------------------------
    Description: 
When an IOException occurs during response creation in 
ServerCall.setResponse(), the method only catches the IOException and logs a 
warning and sets the response to null. This causes the client to receive no 
response or experience connection issues without knowing what went wrong on the 
server side.

An example of the current ServerCall.setResponse catching the exception is the 
flaw in the fix in {-}HBASE-14598{-}.

The original fix for -HBASE-14598- addressed two aspects:
 # When a Scan/Get RPC attempts to allocate an excessively large array that 
could trigger an OutOfMemoryError (OOM), it checks the array size before 
allocation and throws a BufferOverflowException to prevent OOM.
 # The fix intended to stop client retries for such failures by throwing a 
DoNotRetryException when a BufferOverflowException occurs, as retrying cannot 
resolve the underlying issue.

*The Problem:* The DoNotRetryException is never propagated to the client side. 
Here's the issue flow:
 # ByteBufferOutputStream.checkSizeAndGrow() throws BufferOverflowException
 # The exception propagates through the call stack:
 ** ByteBufferOutputStream.checkSizeAndGrow()
 ** encoder.write()
 ** encodeCellsTo() (Catches BufferOverflowException and turns it into 
DoNotRetryIOException)
 ** this.cellBlockBuilder.buildCellBlockStream()
 ** call.setResponse()
 # The DoNotRetryException is ultimately caught in call.setResponse, where it 
is merely logged but not sent back to the client
 # As a result, the client continues retrying indefinitely since the response 
is null and the Netty connection will be closed.

*Current Status:* In the latest branches (3.0 and 2.6), this issue still 
exists. In ServerCall.java, when ALLOCATOR_POOL_ENABLED_KEY 
(hbase.ipc.server.reservoir.enabled) is set to false, the setResponse() method 
follows the same problematic path. If a DoNotRetryException is thrown in 
{{{}ByteBuffer b = this.cellBlockBuilder.buildCellBlock(this.connection.codec, 
this.connection.compressionCodec, cells);{}}}, it gets swallowed in the 
setResponse() catch block and never reaches the client.

*Steps to Reproduce:*
 # Set up a 3-node HBase cluster with 3 RegionServers
 # Set hbase.ipc.server.reservoir.enabled to false to use ByteBufferOutputStream
 # Inject a BufferOverflowException at 
ByteBufferOutputStream.checkSizeAndGrow() to simulate an OOM condition
 # Send a scan request
 # Observe endless client retries

*Expected Behavior:* The DoNotRetryException should be properly propagated to 
the client to prevent retry attempts.

  was:
I have discovered that the fix for HBASE-14598 does not completely resolve the 
issue, and the problem persists in the latest branches (3.0 and 2.6).

The original fix for HBASE-14598 addressed two aspects:
 # When a Scan/Get RPC attempts to allocate an excessively large array that 
could trigger an OutOfMemoryError (OOM), it checks the array size before 
allocation and throws a {{BufferOverflowException}} to prevent OOM.
 # The fix intended to stop client retries for such failures by throwing a 
{{DoNotRetryException}} when a {{BufferOverflowException}} occurs, as retrying 
cannot resolve the underlying issue.

*The Problem:* The {{DoNotRetryException}} is never propagated to the client 
side. Here's the issue flow:
 # {{ByteBufferOutputStream.checkSizeAndGrow()}} throws 
{{BufferOverflowException}}
 # The exception propagates through the call stack:
 ** {{ByteBufferOutputStream.checkSizeAndGrow()}}
 ** {{encoder.write()}}
 ** {{encodeCellsTo() (Catch BufferOverflowException and turn it into 
DoNotRetryIOException)}}
 ** {{this.cellBlockBuilder.buildCellBlockStream()}}
 ** {{call.setResponse()}}
 # The {{DoNotRetryException}} is ultimately caught in call.setResponse, where 
it is merely logged but not sent back to the client
 # As a result, the client continues retrying indefinitely since the response 
is null and  netty connection will be closed.

*Current Status:* In the latest branches (3.0 and 2.6), this issue still 
exists. In {{{}ServerCall.java{}}}, when {{ALLOCATOR_POOL_ENABLED_KEY}} 
({{{}hbase.ipc.server.reservoir.enabled{}}}) is set to {{{}false{}}}, the 
{{setResponse()}} method follows the same problematic path. If a 
{{DoNotRetryException}} is thrown in the
ByteBuffer b = this.cellBlockBuilder.buildCellBlock(this.connection.codec,
this.connection.compressionCodec, cells); it gets swallowed in the 
{{setResponse()}} catch block and never reaches the client.
*Steps to Reproduce:*
 # Set up a 3-node HBase cluster with 3 RegionServers
 # Set {{hbase.ipc.server.reservoir.enabled}} to {{false to use 
ByteBufferOutputStream}}
 # Inject a {{BufferOverflowException}} at 
{{ByteBufferOutputStream.checkSizeAndGrow()}} to simulate an OOM condition
 # Send a scan request
 # Observe endless client retries

*Expected Behavior:* The {{DoNotRetryException}} should be properly propagated 
to the client to prevent retry attempts.


> Server side DoNotRetryException not propagated to client
> --------------------------------------------------------
>
>                 Key: HBASE-28589
>                 URL: https://issues.apache.org/jira/browse/HBASE-28589
>             Project: HBase
>          Issue Type: Bug
>          Components: IPC/RPC
>    Affects Versions: 2.0.0, 2.4.0, 2.5.0, 2.6.0, 3.0.0
>            Reporter: ZhenyuLi
>            Priority: Major
>
> When an IOException occurs during response creation in 
> ServerCall.setResponse(), the method only catches the IOException and logs a 
> warning and sets the response to null. This causes the client to receive no 
> response or experience connection issues without knowing what went wrong on 
> the server side.
> An example of the current ServerCall.setResponse catching the exception is 
> the flaw in the fix in {-}HBASE-14598{-}.
> The original fix for -HBASE-14598- addressed two aspects:
>  # When a Scan/Get RPC attempts to allocate an excessively large array that 
> could trigger an OutOfMemoryError (OOM), it checks the array size before 
> allocation and throws a BufferOverflowException to prevent OOM.
>  # The fix intended to stop client retries for such failures by throwing a 
> DoNotRetryException when a BufferOverflowException occurs, as retrying cannot 
> resolve the underlying issue.
> *The Problem:* The DoNotRetryException is never propagated to the client 
> side. Here's the issue flow:
>  # ByteBufferOutputStream.checkSizeAndGrow() throws BufferOverflowException
>  # The exception propagates through the call stack:
>  ** ByteBufferOutputStream.checkSizeAndGrow()
>  ** encoder.write()
>  ** encodeCellsTo() (Catches BufferOverflowException and turns it into 
> DoNotRetryIOException)
>  ** this.cellBlockBuilder.buildCellBlockStream()
>  ** call.setResponse()
>  # The DoNotRetryException is ultimately caught in call.setResponse, where it 
> is merely logged but not sent back to the client
>  # As a result, the client continues retrying indefinitely since the response 
> is null and the Netty connection will be closed.
> *Current Status:* In the latest branches (3.0 and 2.6), this issue still 
> exists. In ServerCall.java, when ALLOCATOR_POOL_ENABLED_KEY 
> (hbase.ipc.server.reservoir.enabled) is set to false, the setResponse() 
> method follows the same problematic path. If a DoNotRetryException is thrown 
> in {{{}ByteBuffer b = 
> this.cellBlockBuilder.buildCellBlock(this.connection.codec, 
> this.connection.compressionCodec, cells);{}}}, it gets swallowed in the 
> setResponse() catch block and never reaches the client.
> *Steps to Reproduce:*
>  # Set up a 3-node HBase cluster with 3 RegionServers
>  # Set hbase.ipc.server.reservoir.enabled to false to use 
> ByteBufferOutputStream
>  # Inject a BufferOverflowException at 
> ByteBufferOutputStream.checkSizeAndGrow() to simulate an OOM condition
>  # Send a scan request
>  # Observe endless client retries
> *Expected Behavior:* The DoNotRetryException should be properly propagated to 
> the client to prevent retry attempts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HBASE-28589) Server side DoNotRetryException not propagated to client

Reply via email to