[ https://issues.apache.org/jira/browse/HBASE-28963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ray Mattingly updated HBASE-28963: ---------------------------------- Summary: Updating Quota Factors is too expensive (was: Updating Table Machine Quota Factors is too expensive) > Updating Quota Factors is too expensive > --------------------------------------- > > Key: HBASE-28963 > URL: https://issues.apache.org/jira/browse/HBASE-28963 > Project: HBase > Issue Type: Bug > Affects Versions: 2.6.1 > Reporter: Ray Mattingly > Assignee: Ray Mattingly > Priority: Major > Attachments: image-2024-11-06-12-06-44-317.png, > quota-refresh-hmaster.png > > > My company is running Quotas across a few hundred clusters of varied size. > One cluster has hundreds of servers and tens of thousands of regions. We > noticed that the HMaster was quite busy for this cluster, and after some > investigation we realized that RegionServers were hammering the HMaster's > ClusterMetrics endpoint to facilitate the refreshing of table machine quota > factors. > There are a few things that we could do here — in a perfect world, I think > the RegionServers would have a better P2P communication of region states, and > whatever else is, necessary to derive new quota factors. Relying solely on > the HMaster for this coordination creates a tricky bottleneck for the > horizontal scalability of clusters. > That said, I think that a simpler and preferable initial step would be to > make our code a bit more cost conscious. At my company, for example, we don't > even define any table-scoped quotas. Without any table scoped quotas in the > cache, our cache could be much more thoughtful about the work that it chooses > to do on each refresh. So I'm proposing that we check [the size of the > tableQuotaCache > keyset|https://github.com/apache/hbase/blob/db3ba44a4c692d26e70b6030fc519e92fd79f638/hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/QuotaCache.java#L418] > earlier, and use this inference to determine what ClusterMetrics we bother > to fetch. -- This message was sent by Atlassian Jira (v8.20.10#820010)