[ 
https://issues.apache.org/jira/browse/HBASE-29006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-29006:
-----------------------------------
    Labels: pull-request-available  (was: )

> The region assignment retry logic in unconstrained and may cause workload 
> amplification
> ---------------------------------------------------------------------------------------
>
>                 Key: HBASE-29006
>                 URL: https://issues.apache.org/jira/browse/HBASE-29006
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Shangshu Qian
>            Priority: Major
>              Labels: pull-request-available
>
> We found a potential feedback loop in the region assignment process that may 
> overload the RegionServer (RS).
> The `AssigmentManager.processAssignmentPlans()` will retry the assignment 
> when any HBaseIOException happens. For example, 
> `FavoerableNodeAssignmentHelper.canPlaceFavoredNodes` may throw an HIOE when 
> the nodes available are less than three. The HIOE will be caught by the catch 
> block here:
> {code:java}
>   private void processAssignmentPlans(final HashMap<RegionInfo, 
> RegionStateNode> regions,
>     final HashMap<RegionInfo, ServerName> retainMap, final List<RegionInfo> 
> hris,
>     final List<ServerName> servers) {
>     boolean isTraceEnabled = LOG.isTraceEnabled();
>     if (isTraceEnabled) {
>       LOG.trace("Available servers count=" + servers.size() + ": " + servers);
>     }    final LoadBalancer balancer = getBalancer();
>     // ask the balancer where to place regions
>     if (retainMap != null && !retainMap.isEmpty()) {
>       if (isTraceEnabled) {
>         LOG.trace("retain assign regions=" + retainMap);
>       }
>       try {
>         acceptPlan(regions, balancer.retainAssignment(retainMap, servers));
>       } catch (HBaseIOException e) {
>         LOG.warn("unable to retain assignment", e);
>         addToPendingAssignment(regions, retainMap.keySet());
>       }
>     } {code}
> The assignment is simply retried and is not bounded.
> This can cause problems when the assignment fails because the RS is 
> overloaded. More retries in the region assignment can make the overloading 
> worse.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to