[jira] [Created] (HBASE-29006) The region assignment retry logic in unconstrained and may cause workload amplification

Shangshu Qian (Jira) Fri, 29 Nov 2024 21:00:54 -0800

Shangshu Qian created HBASE-29006:
-------------------------------------

             Summary: The region assignment retry logic in unconstrained and 
may cause workload amplification
                 Key: HBASE-29006
                 URL: https://issues.apache.org/jira/browse/HBASE-29006
             Project: HBase
          Issue Type: Bug
    Affects Versions: 2.6.0
            Reporter: Shangshu Qian



We found a potential feedback loop in the region assignment process that may 
overload the RegionServer (RS).

The `AssigmentManager.processAssignmentPlans()` will retry the assignment when 
any HBaseIOException happens. For example, 
`FavoerableNodeAssignmentHelper.canPlaceFavoredNodes` may throw an HIOE when 
the nodes available are less than three. The HIOE will be caught by the catch 
block here:
{code:java}
  private void processAssignmentPlans(final HashMap<RegionInfo, 
RegionStateNode> regions,
    final HashMap<RegionInfo, ServerName> retainMap, final List<RegionInfo> 
hris,
    final List<ServerName> servers) {
    boolean isTraceEnabled = LOG.isTraceEnabled();
    if (isTraceEnabled) {
      LOG.trace("Available servers count=" + servers.size() + ": " + servers);
    }    final LoadBalancer balancer = getBalancer();
    // ask the balancer where to place regions
    if (retainMap != null && !retainMap.isEmpty()) {
      if (isTraceEnabled) {
        LOG.trace("retain assign regions=" + retainMap);
      }
      try {
        acceptPlan(regions, balancer.retainAssignment(retainMap, servers));
      } catch (HBaseIOException e) {
        LOG.warn("unable to retain assignment", e);
        addToPendingAssignment(regions, retainMap.keySet());
      }
    } {code}
The assignment is simply retried and is not bounded.

This can cause problems when the assignment fails because the RS is overloaded. 
More retries in the region assignment can make the overloading worse.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HBASE-29006) The region assignment retry logic in unconstrained and may cause workload amplification

Reply via email to