[ 
https://issues.apache.org/jira/browse/HBASE-27593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eungsop Yoo reassigned HBASE-27593:
-----------------------------------

    Assignee: Eungsop Yoo

> Clear meta cache for full server when handling FailedServerException
> --------------------------------------------------------------------
>
>                 Key: HBASE-27593
>                 URL: https://issues.apache.org/jira/browse/HBASE-27593
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Bryan Beaudreault
>            Assignee: Eungsop Yoo
>            Priority: Major
>
> Currently we prefer to clear meta cache for an individual region that fails. 
> This is preferred in most cases, because clearing cache for an entire server 
> is much more expensive. If a server hosts 100 regions, unnecessarily clearing 
> the cache for the entire server would cause 100 meta requests per client.
> However, when a client fails to connect to a regionserver, it gets added to 
> the FailedServers list. Subsequent requests to that server are fast-failed, 
> throwing a FailedServerException.
> This is a pretty clear indicator that there's a problem with a specific 
> server. In this case I think we should clear the cache for that full server.
> We had a production incident recently where a server completely hung and we 
> did see "Clear Region" calls, but the server hosted many regions and the meta 
> clears continued for a while longer than necessary. Adding "Clear Server" 
> call due to FailedServers would have mitigated this issue much quicker.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to