Shangshu Qian created HBASE-29006:
-------------------------------------

             Summary: The region assignment retry logic in unconstrained and 
may cause workload amplification
                 Key: HBASE-29006
                 URL: https://issues.apache.org/jira/browse/HBASE-29006
             Project: HBase
          Issue Type: Bug
    Affects Versions: 2.6.0
            Reporter: Shangshu Qian


We found a potential feedback loop in the region assignment process that may 
overload the RegionServer (RS).

The `AssigmentManager.processAssignmentPlans()` will retry the assignment when 
any HBaseIOException happens. For example, 
`FavoerableNodeAssignmentHelper.canPlaceFavoredNodes` may throw an HIOE when 
the nodes available are less than three. The HIOE will be caught by the catch 
block here:
{code:java}
  private void processAssignmentPlans(final HashMap<RegionInfo, 
RegionStateNode> regions,
    final HashMap<RegionInfo, ServerName> retainMap, final List<RegionInfo> 
hris,
    final List<ServerName> servers) {
    boolean isTraceEnabled = LOG.isTraceEnabled();
    if (isTraceEnabled) {
      LOG.trace("Available servers count=" + servers.size() + ": " + servers);
    }    final LoadBalancer balancer = getBalancer();
    // ask the balancer where to place regions
    if (retainMap != null && !retainMap.isEmpty()) {
      if (isTraceEnabled) {
        LOG.trace("retain assign regions=" + retainMap);
      }
      try {
        acceptPlan(regions, balancer.retainAssignment(retainMap, servers));
      } catch (HBaseIOException e) {
        LOG.warn("unable to retain assignment", e);
        addToPendingAssignment(regions, retainMap.keySet());
      }
    } {code}
The assignment is simply retried and is not bounded.

This can cause problems when the assignment fails because the RS is overloaded. 
More retries in the region assignment can make the overloading worse.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to