[
https://issues.apache.org/jira/browse/HBASE-27781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Roudnitsky updated HBASE-27781:
--------------------------------------
Description:
In AsyncFutureRequestImpl we fail fast when operation timeout is exceeded
during location resolution
[here|https://github.com/apache/hbase/blob/branch-2.5/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L460-L462].
In that handling, we loop all actions and set them as failed. The problem is,
some number of actions may already finished when we get to this spot. So the
actionsInProgress would have been decremented for those already, and now we're
going to decrement by all actions. This causes an assertion error since we go
negative
[here|https://github.com/apache/hbase/blob/branch-2.5/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L1197],
causing the HBase client to throw an unchecked exception to the user
application layer invoking the client, which can kill the caller
thread/application that invoked the operation which should have timed out
(rather than throwing AssertionError), as the user application layer should not
be catching {{Error}} and its subclasses like {{{}AssertionError{}}}.
We still want to fail all remaining/incomplete actions being processed in
groupAndSendMulti , because none will be executed after location resolution i.
But we need special handling to avoid this case. Maybe don't bother
decrementing the actionsInProgress at all, instead set to 0.
was:
In AsyncFutureRequestImpl we fail fast when operation timeout is exceeded
during location resolution
[here|https://github.com/apache/hbase/blob/branch-2.5/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L460-L462].
In that handling, we loop all actions and set them as failed. The problem is,
some number of actions may already finished when we get to this spot. So the
actionsInProgress would have been decremented for those already, and now we're
going to decrement by all actions. This causes an assertion error since we go
negative
[here|https://github.com/apache/hbase/blob/branch-2.5/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L1197],
causing the HBase client to throw an unchecked exception which can kill the
caller thread that invoked the operation which should have timed out, as
callers of the client should not be catching {{Error}} and its subclasses like
{{AssertionError}}.
We still want to fail all actions, because none will be executed. But we need
special handling to avoid this case. Maybe don't bother decrementing the
actionsInProgress at all, instead set to 0.
> AssertionError in AsyncRequestFutureImpl when timing out during location
> resolution
> -----------------------------------------------------------------------------------
>
> Key: HBASE-27781
> URL: https://issues.apache.org/jira/browse/HBASE-27781
> Project: HBase
> Issue Type: Bug
> Components: asyncclient
> Reporter: Bryan Beaudreault
> Assignee: Daniel Roudnitsky
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.6.3
>
>
> In AsyncFutureRequestImpl we fail fast when operation timeout is exceeded
> during location resolution
> [here|https://github.com/apache/hbase/blob/branch-2.5/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L460-L462].
> In that handling, we loop all actions and set them as failed. The problem
> is, some number of actions may already finished when we get to this spot. So
> the actionsInProgress would have been decremented for those already, and now
> we're going to decrement by all actions. This causes an assertion error since
> we go negative
> [here|https://github.com/apache/hbase/blob/branch-2.5/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L1197],
> causing the HBase client to throw an unchecked exception to the user
> application layer invoking the client, which can kill the caller
> thread/application that invoked the operation which should have timed out
> (rather than throwing AssertionError), as the user application layer should
> not be catching {{Error}} and its subclasses like {{{}AssertionError{}}}.
> We still want to fail all remaining/incomplete actions being processed in
> groupAndSendMulti , because none will be executed after location resolution
> i. But we need special handling to avoid this case. Maybe don't bother
> decrementing the actionsInProgress at all, instead set to 0.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)