Re: [PR] HBASE-29585 Add row-level cache for the get operation [hbase]

via GitHub Wed, 24 Sep 2025 21:32:28 -0700


EungsopYoo commented on PR #7291:
URL: https://github.com/apache/hbase/pull/7291#issuecomment-3332099095


   > > > > 5. One problem of adding such small units (a single row) in the 
cache is that we need to keep a map index for each entry. So, the smaller the 
row in size, more rows would fit in the cache, but more key objects would be 
retained in the map. In your tests, assuming the default block cache size of 
40% of the heap, it would give a 12.8GB of block cache. Have you managed to 
measure the block cache usage by the row cache, in terms of number of rows in 
the cache, byte size of the L1 cache and the total heap usage? Maybe wort 
collecting a heapdump to analyse the map index size in the heap.
   > > > 
   > > > 
   > > > I slightly modified the LruBlockCache code to record the row cache 
size and entry count. The row cache occupies 268.67MB with 338,602 entries. The 
average size of a single row cache entry is 830 bytes. Within the overall 
BlockCache, the row cache accounts for 45% by entry count and 2% by size.
   > > > ```
   > > > 2025-09-12T09:08:44,112 INFO  [LruBlockCacheStatsExecutor {}] 
hfile.LruBlockCache: totalSize=12.80 GB, usedSize=12.48 GB, freeSize=329.41 MB, 
max=12.80 GB, blockCount=752084, accesses=35942999, hits=27403857, 
hitRatio=76.24%, , cachingAccesses=35942954, cachingHits=27403860, 
cachingHitsRatio=76.24%, evictions=170, evicted=5806436, 
evictedPerRun=34155.50588235294, rowBlockCount=338602, rowBlockSize=268.67 MB
   > > > ```
   > > 
   > > 
   > > What if more rows get cached, over time, as more gets for different rows 
are executed? It could lead to many rows in the cache, and many more objects in 
the map to index it. In the recent past. we've seen some heap issues when 
having very large file based bucket cache and small compressed blocks. I guess 
we could face similar problems here too.
   > 
   > Okay. Then I’ll take a heap dump and check the size of the map’s index.
   
   I configured the RegionServer with a 4 GB heap, setting 
hfile.block.cache.size to 0.3 and row.cache.size to 0.1, then reran the same 
workload as before. Under these settings, the maximum RowCache capacity is 
approximately 400 MB. After the RowCache was fully populated, I generated and 
analyzed a heap dump.
    - RowCache Size: 409 MB
    - RowCache Count: 697,234 entries
    - Average RowCache Entry Size: 615 B
      - This is reduced from 830 B previously, mainly due to a simplified 
RowCacheKey.
    - Retained Heap Size: 622 MB
      - Because of the overhead associated with Caffeine’s key/value 
structures, the retained size on heap amounts to 52% more than the actual data 
size for this workload.
      - I believe this is acceptable if the RowCache size is configured 
relatively smaller than the BlockCache, for example, around 5% of the 
BlockCache size. The positive impact of RowCache is already noticeable even at 
this smaller capacity.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] HBASE-29585 Add row-level cache for the get operation [hbase]

Reply via email to