[ 
https://issues.apache.org/jira/browse/HBASE-28963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Mattingly updated HBASE-28963:
----------------------------------
    Summary: Updating Quota Factors is too expensive  (was: Updating Table 
Machine Quota Factors is too expensive)

> Updating Quota Factors is too expensive
> ---------------------------------------
>
>                 Key: HBASE-28963
>                 URL: https://issues.apache.org/jira/browse/HBASE-28963
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.6.1
>            Reporter: Ray Mattingly
>            Assignee: Ray Mattingly
>            Priority: Major
>         Attachments: image-2024-11-06-12-06-44-317.png, 
> quota-refresh-hmaster.png
>
>
> My company is running Quotas across a few hundred clusters of varied size. 
> One cluster has hundreds of servers and tens of thousands of regions. We 
> noticed that the HMaster was quite busy for this cluster, and after some 
> investigation we realized that RegionServers were hammering the HMaster's 
> ClusterMetrics endpoint to facilitate the refreshing of table machine quota 
> factors.
> There are a few things that we could do here — in a perfect world, I think 
> the RegionServers would have a better P2P communication of region states, and 
> whatever else is, necessary to derive new quota factors. Relying solely on 
> the HMaster for this coordination creates a tricky bottleneck for the 
> horizontal scalability of clusters.
> That said, I think that a simpler and preferable initial step would be to 
> make our code a bit more cost conscious. At my company, for example, we don't 
> even define any table-scoped quotas. Without any table scoped quotas in the 
> cache, our cache could be much more thoughtful about the work that it chooses 
> to do on each refresh. So I'm proposing that we check [the size of the 
> tableQuotaCache 
> keyset|https://github.com/apache/hbase/blob/db3ba44a4c692d26e70b6030fc519e92fd79f638/hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/QuotaCache.java#L418]
>  earlier, and use this inference to determine what ClusterMetrics we bother 
> to fetch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to