[jira] [Commented] (HBASE-29103) Avoid excessive allocations during reverse scanning when seeking to next row

Becker Ewing (Jira) Tue, 04 Feb 2025 20:50:25 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-29103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923912#comment-17923912
 ]


Becker Ewing commented on HBASE-29103:
--------------------------------------

I've gotten around to benchmarking master vs. this patch and I wanted to post 
the results here.

 

I was primarily interested in how two paths that are reverse scan heavy would 
behave:
 # A long reverse scan on rows with moderately sized values (90 bytes)
 # Random meta scans

 

_Note: these are the same paths that were heavily tested when overhauling the 
reverse scan path to be faster when done over storefiles with a non-RIV1 DBE in 
HBASE-28043_

 

*1. Reverse Scans Over Moderately Sized Values*

I hypothesize that this patch will help the most for reverse scans over rows 
with moderately sized values. To set this test environment up, I did the 
following:

1. Start a local HBase cluster: 
{code:java}
hbase master start --localRegionServers=1 {code}
I used the default settings/properties as configured on the master branch.

2. Prepare the table state for benchmarking:
{code:java}
hbase pe --nomapred=true --valueSize=90 --blockEncoding=PREFIX --compress=GZ 
randomWrite 1 {code}
3. Flush and compact the table under test to ensure that subsequent tests are 
operating over an equivalent single storefile (& the memstore doesn't influence 
results):
{code:java}
$ hbase shell
> flush ‘TestTable’
> major_compact ‘TestTable’  {code}
4. For actual benchmarking, we'll be running the following command and 
recording the results
{code:java}
hbase pe --nomapred=true reverseScan 1 {code}
Typically the first test is a bit slower as blocks are decompressed in addition 
to the actual scanning logic. I run 3 tests, 1 cold and 2 directly afterwards 
on the hot cache.

 

I got the following results:
||Benchmark||Revision||Time (s)||Throughput (rows/sec)||Throughput (MB/s)||
|Reverse Scan Run #1|master|15.435|67935|66.86|
|Reverse Scan Run #2|master|14.213|73776|72.61|
|Reverse Scan Run #3|master|13.650|76819|75.60|
|Reverse Scan Run #1|patch|14.536|72136|71.00|
|Reverse Scan Run #2|patch|13.395|78281|77.04|
|Reverse Scan Run #3|patch|13.586|77181|75.96|

I think it's clear here that this patch absolutely gives a little improvement 
to this case which is nice.

 

*2. Random Meta Scans*

Since we touch the reverse scan path here, I think it's only appropriate to 
benchmark the system critical hbase:meta read path. To set this test 
environment up, I did the following:

1. Start a local HBase cluster: 
{code:java}
hbase master start --localRegionServers=1 {code}
I used the default settings/properties as configured on the master branch.

2. Prepare the meta table for benchmarking:
{code:java}
hbase pe --nomapred=true metaWrite 1 {code}
3. Flush and compact the hbase:meta table to ensure that subsequent tests are 
operating over an equivalent single storefile (& the memstore doesn't influence 
results):
{noformat}
$ hbase shell
> flush ‘hbase:meta’
> major_compact ‘hbase:meta’ {noformat}
4. For actual benchmarking, we'll be running the following command and 
recording the results:
{code:java}
hbase pe --nomapred=true metaRandomRead 10 {code}
This test takes so long and I don't have quite as much patience. Given it's 
long run time, I only ran 1 test and assumed that there would be less 
variability of results given the long runtime. This assumption could be wrong, 
but I encourage follow up verification.

 

I got the following results:
||Benchmark||Revision||Time (s)||Throughput (rows/sec)||Avg Latency (us)||
|Meta Random Read #1|master|643.407|16311|613|
|Meta Random Read #1|patch|633.051|16592|601|

Which does show a bit of improvement between the master and patch (about the 
projected 2-3%).

 

The above test state writes a ton of junk to the hbase:meta table. The clean up 
the meta table afterwards you can run:
{code:java}
hbase pe --nomapred=true cleanMeta 1 {code}
 

*Conclusion*

This isn't a groundbreaking performance improvement, but the patch does show 
the theoretical incremental 2-3% improvement over master that memory profiling 
showed was possible. Region Servers on this patch will allocate less on the 
heap which will _always_ make a Java application run faster. Mileage will vary 
and this improvement will be less noticeable in a Region Server which has it's 
memory, heap, and garbage collector settings tuned correctly. However, in a 
region server running closer to the edge this will give a small improvement. 
Additionally, this will help clusters that are running regular workloads in 
addition to reverse scan workloads since this impacts the meta read path.

 

One result I can't explain is how my Meta Random Read benchmarks taken in this 
patch/master are so much better than those taken just a year and a half ago in 
HBASE-28043. Obviously, the reverse scan benchmarks can't be compared 
apples-to-apples given that we're using different value sizes, however, the 
meta random read tests should be directly comparable. Given that, these tests 
are 25% faster for average read latency which is amazing, I just find it hard 
to explain especially given that I haven't upgraded hardware in that time. I 
think the biggest difference is maybe that underlying JDK that I'm running with 
21 vs. 11. 

> Avoid excessive allocations during reverse scanning when seeking to next row
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-29103
>                 URL: https://issues.apache.org/jira/browse/HBASE-29103
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance
>    Affects Versions: 3.0.0-beta-1, 2.6.1
>            Reporter: Becker Ewing
>            Assignee: Becker Ewing
>            Priority: Major
>         Attachments: high-block-cache-key-to-string-alloc-profile.html
>
>
> Currently, when we're reverse scanning in a storefile, the general path is to:
>  # Seek to before the current row to find the prior row
>  # Seek to the beginning of the prior row
> (this can get a big more complex depending on how fast a single "seek" 
> operation is, see HBASE-28043 for additional details).
>  
> At step 1, we call HFileScanner#getCell and then we subsequently always call 
> PrivateCellUtil.createFirstOnRow() on this Cell instance 
> ([Code).|https://github.com/apache/hbase/blob/b89c8259c5726395c9ae3a14919bd192252ca517/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java#L611-L614]
>  PrivateCellUtil.createFirstOnRow() creates a [copy of only the row portion 
> of this 
> Cell|https://github.com/apache/hbase/blob/b89c8259c5726395c9ae3a14919bd192252ca517/hbase-common/src/main/java/org/apache/hadoop/hbase/PrivateCellUtil.java#L2768-L2775].
>  
>  
> I propose that since we're only using the key-portion of the cell returned by 
> HFileScanner#getCell, that we should instead call 
> [HFileScanner#getKey|https://github.com/apache/hbase/blob/b89c8259c5726395c9ae3a14919bd192252ca517/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileScanner.java#L91-L96]
>  in this scenario so we avoid deep-copying extra components of the Cell such 
> as the value, tags, etc... This should be a safe change as this Cell instance 
> never escapes StoreFileScanner and we only call HFileScanner#getCell when the 
> scanner is already seeked.
>  
> Attached is the same allocation profile taken to guide the optimizations in 
> HBASE-29099 which shows that about 3% of allocations are spent in 
> [BufferedEncodedSeeker.getCell in the body of 
> seekBeforeAndSaveKeyToPreviousRow|https://github.com/apache/hbase/blob/b89c8259c5726395c9ae3a14919bd192252ca517/hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java#L284-L348].
>  The region server in question here was pinned at 100% CPU utilization for a 
> while and was running a reverse-scan heavy workload.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-29103) Avoid excessive allocations during reverse scanning when seeking to next row

Reply via email to