[
https://issues.apache.org/jira/browse/HBASE-29265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17946554#comment-17946554
]
Daniel Roudnitsky commented on HBASE-29265:
-------------------------------------------
When a batch operation is running, its possible to have c[allables inside the
batch operation throw meta cache clearing exceptions through the duration of
the batch operation with callable
retries|https://github.com/apache/hbase/blob/a8ff965536fda48bbb6d1f77b53a55e43b8d9461/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L766-L807],
and then at the end of the ordeal you can end up with a top level/batch level
exception that can be a RetriesExhaustedWithDeailtsExceptions, or a
SocketTimeoutException if a callable in the batch did not respect the overall
operation timeout.
> I think that SocketTimeoutExceptions can manifest to the client, even if we
> could throw an OperationTimeoutException
I think its possible for this to happen if you hit HBASE-28730 , HBASE-28358
is very relevant here and describes the problem well. I have a patch locally
for HBASE-28730 , but I am blocked there on HBASE-27781 which I have been
having difficulty finding a reviewer for.
> RetriesExhaustedWithDetailsException can create a pathological feedback loop
> with multigets
> -------------------------------------------------------------------------------------------
>
> Key: HBASE-29265
> URL: https://issues.apache.org/jira/browse/HBASE-29265
> Project: HBase
> Issue Type: Improvement
> Reporter: Hernan Gelaf-Romer
> Assignee: Hernan Gelaf-Romer
> Priority: Major
>
> Similar to https://issues.apache.org/jira/browse/HBASE-27487
>
> RetriesExhaustedWithDetailsException currently obscures that the underlying
> exception(s) may be OperationTimeoutExceededException. Because of this, we
> can still run into situations where slow request can trigger a flood of meta
> cache clearing exceptions, and hotspot the meta table.
>
> We should update our exception handling logic to special case these
> exceptions, and explicitly check to see if the underlying root cause for the
> request failures was due to an operation timeout.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)