Ke Han created HBASE-28105:
------------------------------

             Summary: NPE is thrown in QuotaCache.java when running HBase-2.4.17
                 Key: HBASE-28105
                 URL: https://issues.apache.org/jira/browse/HBASE-28105
             Project: HBase
          Issue Type: Bug
          Components: Quotas
    Affects Versions: 2.5.5, 2.4.17
            Reporter: Ke Han
         Attachments: 0001-avoid-NPE.patch

When running HBase-2.4.17, I met a NPE in regionserver log.
h1. Reproduce

Config HBase cluster: 1 HMaster, 2 RS, 2.10.2 Hadoop.

Execute the following commands in the HMaster node using hbase shell, 

 
{code:java}
create 'uuidd9efa97f93a442b686adae6d9f7bb2e9', {NAME => 
'uuid099cbece77834a83a52bb0611c3ea080', VERSIONS => 3, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}, {NAME => 
'uuidbc1bea73952749329d7f025aab382c4e', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
'uuidff292310d9dc450697af2bb25d9f3e98', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 
'uuid449de028da6b4d35be0f187ebec6c3be', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
'uuidc0840c98f9d348a18f2d454c7a503b65', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}
create_namespace 'uuidec797633f5dd4ab9b96276135aeda9e2'
create 'uuiddeb610fded9744889840ecd03dd18739', {NAME => 
'uuid30a0f625ad454605908b60c932957ff0', VERSIONS => 1, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}
incr 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
'uuid46ddc3d3557e413e915e2393ae72c082', 
'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 1
flush 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
'uuid449de028da6b4d35be0f187ebec6c3be'
drop 'uuiddeb610fded9744889840ecd03dd18739'
put 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
'uuidf4704cae4d1e4661bd7664d26eb6b31b', 
'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 
'XlPpFGvSYfcEXWXgwARytlSeiaSuHJFqpirMmLduqGnpdXLlHJWBumraXiifQSvHqNHmTcyzLQIvuQrkujPghfdtRkhOkgKEJHsAuAiMMeWZjdTHNZqhkOdJBOzsRYUXKOCNKeSxEDWgnKgsFDHMtxdnKKudBuceOgYmCrdaPXMclKkZKCIEiFDcdoAEJGKXYVfOjb'
disable 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
drop 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
create 'uuid9d05a5cb34e64910ac90675186e7d0d4', {NAME => 
'uuid1ce512a5997b4efea3bdead2e7f723c3', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 
'uuid0b1baaa4275e46b2a3a1d11d6540fc30', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}
put 'uuid9d05a5cb34e64910ac90675186e7d0d4', 
'uuid552e42ade4c14099a1d8643bea1616d4', 
'uuid1ce512a5997b4efea3bdead2e7f723c3:l', 1
drop 'uuid9d05a5cb34e64910ac90675186e7d0d4'{code}
Then the exception will be thrown in either RS1 or RS2

 

 
{code:java}
2023-09-19 20:29:28,268 INFO  [RS_OPEN_REGION-regionserver/hregion2:16020-2] 
handler.AssignRegionHandler: Opened 
uuid9d05a5cb34e64910ac90675186e7d0d4,,1695155367072.f59a0693a9469f9e1f131bf2aac1486d.
2023-09-19 20:29:29,205 ERROR [regionserver/hregion2:16020.Chore.1] 
hbase.ScheduledChore: Caught error
java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.updateQuotaFactors(QuotaCache.java:378)
        at 
org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.chore(QuotaCache.java:224)
        at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:158)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at 
org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:107)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750){code}
 
h1. Root Cause

The NPE is thrown at 

 
{code:java}
private void updateQuotaFactors() {
  // Update machine quota factor
  ClusterMetrics clusterMetrics;
  try {
    clusterMetrics = rsServices.getConnection().getAdmin()
      .getClusterMetrics(EnumSet.of(Option.SERVERS_NAME, 
Option.TABLE_TO_REGIONS_COUNT));
  } catch (IOException e) {
    LOG.warn("Failed to get cluster metrics needed for updating quotas", e);
    return;
  }  int rsSize = clusterMetrics.getServersName().size();
  if (rsSize != 0) {
    // TODO if use rs group, the cluster limit should be shared by the rs group
    machineQuotaFactor = 1.0 / rsSize;
  }  Map<TableName, RegionStatesCount> tableRegionStatesCount =
    clusterMetrics.getTableRegionStatesCount();  // Update table machine quota 
factors
  for (TableName tableName : tableQuotaCache.keySet()) {
    double factor = 1;
    try {
      long regionSize = tableRegionStatesCount.get(tableName).getOpenRegions();
      if (regionSize == 0) {
        factor = 0;
      } else {
        int localRegionSize = rsServices.getRegions(tableName).size();
        factor = 1.0 * localRegionSize / regionSize;
      }
    } catch (IOException e) {
      LOG.warn("Get table regions failed: {}", tableName, e);
    }
    tableMachineQuotaFactors.put(tableName, factor);
  }
} {code}
 

This function tries to update the tableQuotaCache. At line 378: the 
tableRegionStatesCount.get(tableName) return null and thus it runs into NPE.

The tableName leading to NPE is '{*}uuidd9efa97f93a442b686adae6d9f7bb2e9{*}', 
which is disabled and dropped by the user.
{code:java}
long regionSize = tableRegionStatesCount.get(tableName).getOpenRegions(); {code}
The root cause here is that when updating the cache, it might iterate the table 
that has been dropped. Maybe we can add a check to make sure the table still 
exists in the system.

This bug should also happen in *2.5.5* since the related code in QuotaCache 
remains the same (But I only tested 2.4.17).

I attached a simple fix, but not sure whether it also works for other cases.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to