Ke Han created HBASE-28105: ------------------------------ Summary: NPE is thrown in QuotaCache.java when running HBase-2.4.17 Key: HBASE-28105 URL: https://issues.apache.org/jira/browse/HBASE-28105 Project: HBase Issue Type: Bug Components: Quotas Affects Versions: 2.5.5, 2.4.17 Reporter: Ke Han Attachments: 0001-avoid-NPE.patch
When running HBase-2.4.17, I met a NPE in regionserver log. h1. Reproduce Config HBase cluster: 1 HMaster, 2 RS, 2.10.2 Hadoop. Execute the following commands in the HMaster node using hbase shell, {code:java} create 'uuidd9efa97f93a442b686adae6d9f7bb2e9', {NAME => 'uuid099cbece77834a83a52bb0611c3ea080', VERSIONS => 3, COMPRESSION => 'NONE', BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}, {NAME => 'uuidbc1bea73952749329d7f025aab382c4e', VERSIONS => 2, COMPRESSION => 'GZ', BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 'uuidff292310d9dc450697af2bb25d9f3e98', VERSIONS => 2, COMPRESSION => 'GZ', BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 'uuid449de028da6b4d35be0f187ebec6c3be', VERSIONS => 2, COMPRESSION => 'GZ', BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 'uuidc0840c98f9d348a18f2d454c7a503b65', VERSIONS => 2, COMPRESSION => 'GZ', BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'} create_namespace 'uuidec797633f5dd4ab9b96276135aeda9e2' create 'uuiddeb610fded9744889840ecd03dd18739', {NAME => 'uuid30a0f625ad454605908b60c932957ff0', VERSIONS => 1, COMPRESSION => 'GZ', BLOOMFILTER => 'ROW', IN_MEMORY => 'true'} incr 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 'uuid46ddc3d3557e413e915e2393ae72c082', 'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 1 flush 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 'uuid449de028da6b4d35be0f187ebec6c3be' drop 'uuiddeb610fded9744889840ecd03dd18739' put 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 'uuidf4704cae4d1e4661bd7664d26eb6b31b', 'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 'XlPpFGvSYfcEXWXgwARytlSeiaSuHJFqpirMmLduqGnpdXLlHJWBumraXiifQSvHqNHmTcyzLQIvuQrkujPghfdtRkhOkgKEJHsAuAiMMeWZjdTHNZqhkOdJBOzsRYUXKOCNKeSxEDWgnKgsFDHMtxdnKKudBuceOgYmCrdaPXMclKkZKCIEiFDcdoAEJGKXYVfOjb' disable 'uuidd9efa97f93a442b686adae6d9f7bb2e9' drop 'uuidd9efa97f93a442b686adae6d9f7bb2e9' create 'uuid9d05a5cb34e64910ac90675186e7d0d4', {NAME => 'uuid1ce512a5997b4efea3bdead2e7f723c3', VERSIONS => 2, COMPRESSION => 'NONE', BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 'uuid0b1baaa4275e46b2a3a1d11d6540fc30', VERSIONS => 2, COMPRESSION => 'NONE', BLOOMFILTER => 'NONE', IN_MEMORY => 'true'} put 'uuid9d05a5cb34e64910ac90675186e7d0d4', 'uuid552e42ade4c14099a1d8643bea1616d4', 'uuid1ce512a5997b4efea3bdead2e7f723c3:l', 1 drop 'uuid9d05a5cb34e64910ac90675186e7d0d4'{code} Then the exception will be thrown in either RS1 or RS2 {code:java} 2023-09-19 20:29:28,268 INFO [RS_OPEN_REGION-regionserver/hregion2:16020-2] handler.AssignRegionHandler: Opened uuid9d05a5cb34e64910ac90675186e7d0d4,,1695155367072.f59a0693a9469f9e1f131bf2aac1486d. 2023-09-19 20:29:29,205 ERROR [regionserver/hregion2:16020.Chore.1] hbase.ScheduledChore: Caught error java.lang.NullPointerException at org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.updateQuotaFactors(QuotaCache.java:378) at org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.chore(QuotaCache.java:224) at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:158) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:107) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750){code} h1. Root Cause The NPE is thrown at {code:java} private void updateQuotaFactors() { // Update machine quota factor ClusterMetrics clusterMetrics; try { clusterMetrics = rsServices.getConnection().getAdmin() .getClusterMetrics(EnumSet.of(Option.SERVERS_NAME, Option.TABLE_TO_REGIONS_COUNT)); } catch (IOException e) { LOG.warn("Failed to get cluster metrics needed for updating quotas", e); return; } int rsSize = clusterMetrics.getServersName().size(); if (rsSize != 0) { // TODO if use rs group, the cluster limit should be shared by the rs group machineQuotaFactor = 1.0 / rsSize; } Map<TableName, RegionStatesCount> tableRegionStatesCount = clusterMetrics.getTableRegionStatesCount(); // Update table machine quota factors for (TableName tableName : tableQuotaCache.keySet()) { double factor = 1; try { long regionSize = tableRegionStatesCount.get(tableName).getOpenRegions(); if (regionSize == 0) { factor = 0; } else { int localRegionSize = rsServices.getRegions(tableName).size(); factor = 1.0 * localRegionSize / regionSize; } } catch (IOException e) { LOG.warn("Get table regions failed: {}", tableName, e); } tableMachineQuotaFactors.put(tableName, factor); } } {code} This function tries to update the tableQuotaCache. At line 378: the tableRegionStatesCount.get(tableName) return null and thus it runs into NPE. The tableName leading to NPE is '{*}uuidd9efa97f93a442b686adae6d9f7bb2e9{*}', which is disabled and dropped by the user. {code:java} long regionSize = tableRegionStatesCount.get(tableName).getOpenRegions(); {code} The root cause here is that when updating the cache, it might iterate the table that has been dropped. Maybe we can add a check to make sure the table still exists in the system. This bug should also happen in *2.5.5* since the related code in QuotaCache remains the same (But I only tested 2.4.17). I attached a simple fix, but not sure whether it also works for other cases. -- This message was sent by Atlassian Jira (v8.20.10#820010)