For those following along, I’ve submitted an initial PR for this issue. https://github.com/apache/hbase/pull/5104
On Mon, Mar 13, 2023 at 11:46 AM Bryan Beaudreault <[email protected]> wrote: > Thanks everyone for chiming in. I submitted > https://issues.apache.org/jira/browse/HBASE-27710 and will work on it in > the near future. > > This does make me wonder what overhead checkRefCount causes in off-heap > cases too, but we just dont see it in profiling because the server does so > much other work. It has me thinking if there are other optimizations we > could do there. For example, currently we do checkRefCount for every single > ByteBuff operation (slice/duplicate/get/etc/etc). A lot of the buffer > operations happen before creating a Cell for return to user in > HFileBlock/DBE. I wonder if we could do all the parsing we need in > HFileBlock/DBE without checkRefCount, do a single checkRefCount call when > we create the Cell, then enable checkRefCount for all future ByteBuff calls > before returning a cell to upper levels. This would probably be a larger > project, so not investigating now. Just an idea i had. > > On Mon, Mar 13, 2023 at 2:06 AM Viraj Jasani <[email protected]> wrote: > >> +1, as the numbers suggest significant perf improvement. >> >> >> On Thu, Mar 9, 2023 at 9:36 AM Bryan Beaudreault <[email protected] >> > >> wrote: >> >> > Hey all, >> > >> > We have a use-case for HFile.Reader at my company. The use-case in >> question >> > is scanning many hfiles for a small subset of cells, so it mostly is >> > just iterating a large number of HFiles and discarding most of the >> cells. >> > We recently upgraded that deployable from super old cdh 5.16 (hbase >> > 1.2-ish) to hbase 2.5.3. In doing so, we noticed a performance >> regression >> > of around 4x. I imagine this regression would also affect >> > ClientSideRegionScanner. >> > >> > We did some profiling and noticed that a large amount of time is spent >> in >> > SingleByteBuff.checkRefCnt. It seems like every SingleByteBuff method >> calls >> > checkRefCnt and this checks compares a volatile >> > in AtomicIntegerFieldUpdater in the netty code. >> > >> > I believe ref counting is mostly necessary for off-heap buffers, but >> > on-heap buffers are also wrapped in SingleByteBuff and so also go >> through >> > checkRefCnt. We removed the checkRefCnt call, and the regression >> > disappeared. >> > >> > We created a simple test method which just does HFile.createReader, >> > reader.getScanner(), and then iterates the scanner counting the total >> > cells. The test reads an hfile with 100M cells and takes over 1 minute >> > with checkRefCnt. Removing checkRefCnt brings the runtime down to 20s. >> > >> > It's worth noting that the regression is most prominent on java17. It's >> > slightly less obvious on java11, with runtime being 40s vs 28s. >> > >> > Thoughts on updating SingleByteBuff to skip the checkRefCnt call for >> > on-heap buffers? We can handle this in the constructor, when wrapping an >> > on-heap buffer here [1]. >> > >> > [1] >> > >> > >> https://github.com/apache/hbase/blob/master/hbase-common/src/main/java/org/apache/hadoop/hbase/io/ByteBuffAllocator.java#L300 >> > >> >
