Hey all, We have a use-case for HFile.Reader at my company. The use-case in question is scanning many hfiles for a small subset of cells, so it mostly is just iterating a large number of HFiles and discarding most of the cells. We recently upgraded that deployable from super old cdh 5.16 (hbase 1.2-ish) to hbase 2.5.3. In doing so, we noticed a performance regression of around 4x. I imagine this regression would also affect ClientSideRegionScanner.
We did some profiling and noticed that a large amount of time is spent in SingleByteBuff.checkRefCnt. It seems like every SingleByteBuff method calls checkRefCnt and this checks compares a volatile in AtomicIntegerFieldUpdater in the netty code. I believe ref counting is mostly necessary for off-heap buffers, but on-heap buffers are also wrapped in SingleByteBuff and so also go through checkRefCnt. We removed the checkRefCnt call, and the regression disappeared. We created a simple test method which just does HFile.createReader, reader.getScanner(), and then iterates the scanner counting the total cells. The test reads an hfile with 100M cells and takes over 1 minute with checkRefCnt. Removing checkRefCnt brings the runtime down to 20s. It's worth noting that the regression is most prominent on java17. It's slightly less obvious on java11, with runtime being 40s vs 28s. Thoughts on updating SingleByteBuff to skip the checkRefCnt call for on-heap buffers? We can handle this in the constructor, when wrapping an on-heap buffer here [1]. [1] https://github.com/apache/hbase/blob/master/hbase-common/src/main/java/org/apache/hadoop/hbase/io/ByteBuffAllocator.java#L300
