Jaehui-Lee commented on PR #7482:
URL: https://github.com/apache/hbase/pull/7482#issuecomment-3581654722

   @wchevreuil 
   
   Thank you for the review! I apologize for the insufficient explanation in my 
initial description. Let me provide additional clarification.
   
   > And can you explain further what you mean by "non-cached reads (such as 
those from compactions)" ? I don't think compactions bypass the cache, AFAIK, 
it use same HFileReaderImpl in it's scanner, which always looks the cache first.
   
   You're absolutely right! Compactions do read from the cache first. What I 
meant to convey is that while compaction reads do check the cache first, they 
generate many cache misses, which are reflected in the `...(Hit|Miss)Count` 
metrics. However, since scanners created by compactions have `cacheBlocks` set 
to false, they don't affect the `...(Hit|Miss)CachingCount` metrics. Therefore, 
when we want to monitor the cache hit ratio specifically for client read 
requests connected to the HBase cluster (excluding internal operations like 
compactions), we need to check the `...HitCachingRatio` metric, which is what 
we currently monitor.
   
   This can be easily verified with a simple test. With BucketCache enabled and 
an empty region server, follow these steps:
   
   ```
   create 'test', 'cf'
   
   put 'test', 'r1', 'cf:q', 'v1'
   flush 'test'
   put 'test', 'r2', 'cf:q', 'v2'
   flush 'test'
   
   # 1) Load one of the two blocks into the block cache
   get 'test', 'r1', 'cf:q'
   # 2) Observe metric changes after major compaction
   major_compact 'test'
   ```
   
   After step 1, all hit metrics remain at 0 (no change), while both 
`missCount` and `missCachingCount` increase to 2 (not entirely sure why it's 2, 
but that's what we observe).
   After step 2, `hitCount` becomes 1, `hitCachingCount` remains 0, `missCount` 
becomes 3, and `missCachingCount` stays at 2.
   (These metrics were observed from the L2 cache section in the region server 
web UI.)
   This demonstrates that reads from compactions are reflected in the general 
hit/miss metrics but not in the Caching metrics.
   
   > I'm not sure this is true. We do account and expose separate hit rations 
for L1 and L2.
   
   For the reasons explained above, we specifically want to monitor the 
`(Hit|Miss)Caching` metrics rather than the general `Hit|Miss` metrics.
   When `BucketCache` is enabled, `BlockCache` is used as L1 and `BucketCache` 
as L2, which is represented as `CombinedBlockCache` in the HBase 
implementation. The currently exposed `BlockCache(...Count|...Ratio)` metrics 
represent the combined sum of L1 and L2. While we need to monitor each cache 
tier separately, there are currently no `Caching` metrics available for 
individual cache layers. The metrics you mentioned only include 
`(Hit|Miss)Count` format metrics, which—as I explained above—are significantly 
influenced by operations like compactions and don't represent the values we're 
looking for.
   The `Caching` metrics for each cache tier are not currently exposed via JMX 
(which is what this PR aims to enable), but they can be viewed in the 
BlockCache section of the region server web UI. As an example from our 
production cluster, here are the L2 metrics:
   
   - Hits: 14,709,630
   - Hits Caching: 12,340,830
   - Misses: 59,908,348
   - Misses Caching: 17,610,008
   - Hit Ratio: 19.71%
   
   When we calculate the Hit Caching Ratio from these values, we get 41.2%, 
which shows a significant difference from the 19.7% general hit 
ratio—demonstrating why separate caching metrics are important for accurate 
monitoring of client-driven cache performance.
   
   Please let me know if anything is still unclear or if I've misunderstood 
something. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to