EungsopYoo commented on PR #7291:
URL: https://github.com/apache/hbase/pull/7291#issuecomment-3332099095
> > > > 5. One problem of adding such small units (a single row) in the
cache is that we need to keep a map index for each entry. So, the smaller the
row in size, more rows would fit in the cache, but more key objects would be
retained in the map. In your tests, assuming the default block cache size of
40% of the heap, it would give a 12.8GB of block cache. Have you managed to
measure the block cache usage by the row cache, in terms of number of rows in
the cache, byte size of the L1 cache and the total heap usage? Maybe wort
collecting a heapdump to analyse the map index size in the heap.
> > >
> > >
> > > I slightly modified the LruBlockCache code to record the row cache
size and entry count. The row cache occupies 268.67MB with 338,602 entries. The
average size of a single row cache entry is 830 bytes. Within the overall
BlockCache, the row cache accounts for 45% by entry count and 2% by size.
> > > ```
> > > 2025-09-12T09:08:44,112 INFO [LruBlockCacheStatsExecutor {}]
hfile.LruBlockCache: totalSize=12.80 GB, usedSize=12.48 GB, freeSize=329.41 MB,
max=12.80 GB, blockCount=752084, accesses=35942999, hits=27403857,
hitRatio=76.24%, , cachingAccesses=35942954, cachingHits=27403860,
cachingHitsRatio=76.24%, evictions=170, evicted=5806436,
evictedPerRun=34155.50588235294, rowBlockCount=338602, rowBlockSize=268.67 MB
> > > ```
> >
> >
> > What if more rows get cached, over time, as more gets for different rows
are executed? It could lead to many rows in the cache, and many more objects in
the map to index it. In the recent past. we've seen some heap issues when
having very large file based bucket cache and small compressed blocks. I guess
we could face similar problems here too.
>
> Okay. Then I’ll take a heap dump and check the size of the map’s index.
I configured the RegionServer with a 4 GB heap, setting
hfile.block.cache.size to 0.3 and row.cache.size to 0.1, then reran the same
workload as before. Under these settings, the maximum RowCache capacity is
approximately 400 MB. After the RowCache was fully populated, I generated and
analyzed a heap dump.
- RowCache Size: 409 MB
- RowCache Count: 697,234 entries
- Average RowCache Entry Size: 615 B
- This is reduced from 830 B previously, mainly due to a simplified
RowCacheKey.
- Retained Heap Size: 622 MB
- Because of the overhead associated with Caffeine’s key/value
structures, the retained size on heap amounts to 52% more than the actual data
size for this workload.
- I believe this is acceptable if the RowCache size is configured
relatively smaller than the BlockCache, for example, around 5% of the
BlockCache size. The positive impact of RowCache is already noticeable even at
this smaller capacity.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]