[
https://issues.apache.org/jira/browse/HBASE-28941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17975480#comment-17975480
]
Daniel Roudnitsky commented on HBASE-28941:
-------------------------------------------
The changes are a bit involved, I have left one comment but unfortunately I do
not have the async client familiarity to review the full change set with
confidence. At a high level I agree, for failed server, it makes sense to clear
meta cache for the server. Given you are taking the approach in HBASE-27593 ,
do you want to assign that one to yourself and link your PR to that issue, and
mark this jira as a duplicate?
> Clear all meta caches of the server on which hardware failure related
> exceptions occurred
> -----------------------------------------------------------------------------------------
>
> Key: HBASE-28941
> URL: https://issues.apache.org/jira/browse/HBASE-28941
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 2.4.18, 2.5.10
> Reporter: Eungsop Yoo
> Assignee: Eungsop Yoo
> Priority: Major
> Labels: pull-request-available
>
> CallTimeoutException and ConnectException might be caused by a network or
> hardware issue of that server. We might not be able to connect to that server
> for a while. So we have to clear all meta caches of the server on which
> hardware failure related exceptions occurred. If we don't clear the caches,
> we might get the same exceptions as many times as the number of location
> caches of that server.
>
> https://issues.apache.org/jira/browse/HBASE-7590
> https://issues.apache.org/jira/browse/HBASE-22261
> We already have ClusterStatusPublisher/Listener feature. But it is not
> possible to use this feature in some infrastructure environments like me.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)