[jira] [Commented] (HBASE-29265) RetriesExhaustedWithDetailsException can create a pathological feedback loop with multigets

Daniel Roudnitsky (Jira) Tue, 22 Apr 2025 14:38:25 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-29265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17946554#comment-17946554
 ]


Daniel Roudnitsky commented on HBASE-29265:
-------------------------------------------

When a batch operation is running, its possible to have c[allables inside the 
batch operation throw meta cache clearing exceptions through the duration of 
the batch operation with callable 
retries|https://github.com/apache/hbase/blob/a8ff965536fda48bbb6d1f77b53a55e43b8d9461/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L766-L807],
 and then at the end of the ordeal you can end up with a top level/batch level 
exception that can be a RetriesExhaustedWithDeailtsExceptions, or a 
SocketTimeoutException if a callable in the batch did not respect the overall 
operation timeout. 

>  I think that SocketTimeoutExceptions can manifest to the client, even if we 
> could throw an OperationTimeoutException
I think its possible for this to happen if you hit HBASE-28730  , HBASE-28358  
is very relevant here and describes the problem well. I have a patch locally 
for  HBASE-28730 , but I am blocked there on HBASE-27781  which I have been 
having difficulty finding a reviewer for. 

> RetriesExhaustedWithDetailsException can create a pathological feedback loop 
> with multigets
> -------------------------------------------------------------------------------------------
>
>                 Key: HBASE-29265
>                 URL: https://issues.apache.org/jira/browse/HBASE-29265
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Hernan Gelaf-Romer
>            Assignee: Hernan Gelaf-Romer
>            Priority: Major
>
> Similar to https://issues.apache.org/jira/browse/HBASE-27487
>  
> RetriesExhaustedWithDetailsException currently obscures that the underlying 
> exception(s) may be OperationTimeoutExceededException. Because of this, we 
> can still run into situations where slow request can trigger a flood of meta 
> cache clearing exceptions, and hotspot the meta table. 
>  
> We should update our exception handling logic to special case these 
> exceptions, and explicitly check to see if the underlying root cause for the 
> request failures was due to an operation timeout. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-29265) RetriesExhaustedWithDetailsException can create a pathological feedback loop with multigets

Reply via email to