[jira] [Updated] (HBASE-29252) Reduce allocations in RowIndexSeekerV1

Charles Connell (Jira) Fri, 11 Apr 2025 11:28:23 -0700


     [ 
https://issues.apache.org/jira/browse/HBASE-29252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Charles Connell updated HBASE-29252:
------------------------------------
    Description: 
I've looked at a lot of allocation profiles of RegionServers doing a read-heavy 
workload. Some allocations that dominate the chart can be easily avoided.

The following code in the main decode method
{code:java}
currentBuffer.asSubByteBuffer(currentBuffer.position(), current.keyLength, 
tmpPair);
ByteBuffer key = tmpPair.getFirst().duplicate();
key.position(tmpPair.getSecond()).limit(tmpPair.getSecond() + 
current.keyLength);
current.keyBuffer = key; {code}
results in a new ByteBuffer for every cell. The reason to have this duplicate 
ByteBuffer is to hold the result of {{tmpPair.getSecond()}} as its {{position}} 
state. But this in just an integer that can be more cheaply stored in a 
different way. We can introduce a {{current.keyOffset}} variable and do this 
instead:
{code:java}
currentBuffer.asSubByteBuffer(currentBuffer.position(), current.keyLength, 
tmpPair);
current.keyBuffer = tmpPair.getFirst();
current.keyOffset = tmpPair.getSecond();{code}
and then reference {{current.keyOffset}} where we previously referenced 
{{{}current.keyBuffer.position(){}}}. Based 

 

Additionally, {{RowIndexSeekerV1.SeekerState}} contains a 
{{ByteBufferKeyOnlyKeyValue}} field that is replaced on every cell read. This 
object can be reset and re-used instead.

 

On the attached profile, allocations of the duplicate {{ByteBuffers}} and 
{{{}ByteBufferKeyOnlyKeyValue{}}}s collectively account for 35% of allocations 
profiled. This is probably representative of the behavior of a typical 
RegionServer doing a heavy amount of scans while using RowIndexV1.

  was:
I've looked at a lot of allocation profiles of RegionServers doing a read-heavy 
workload. Some allocations that dominate the chart can be easily avoided.
 * {{RowIndexSeekerV1.SeekerState}} contains a {{ByteBufferKeyOnlyKeyValue}} 
field that is replaced on every cell read. This object can be reset and re-used 
instead. On the attached profile, allocations of this object account for 9% of 
the allocations done. This is probably representative of the behavior of a 
typical RegionServer doing a heavy amount of scans while using RowIndexV1.
 * The following code in the main decode method

{code:java}
currentBuffer.asSubByteBuffer(currentBuffer.position(), current.keyLength, 
tmpPair);
ByteBuffer key = tmpPair.getFirst().duplicate();
key.position(tmpPair.getSecond()).limit(tmpPair.getSecond() + 
current.keyLength);
current.keyBuffer = key; {code}
results in a new ByteBuffer for every cell. The reason to have this duplicate 
ByteBuffer is to hold the result of {{tmpPair.getSecond()}} as its {{position}} 
state. But this in just an integer that can be more cheaply stored in a 
different way. We can introduce a {{current.keyOffset}} variable and do this 
instead:

{code:java}
currentBuffer.asSubByteBuffer(currentBuffer.position(), current.keyLength, 
tmpPair);
current.keyBuffer = tmpPair.getFirst();
current.keyOffset = tmpPair.getSecond();{code}
and then reference {{current.keyOffset}} where we previously referenced 
{{{}current.keyBuffer.position(){}}}.


> Reduce allocations in RowIndexSeekerV1
> --------------------------------------
>
>                 Key: HBASE-29252
>                 URL: https://issues.apache.org/jira/browse/HBASE-29252
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Charles Connell
>            Assignee: Charles Connell
>            Priority: Minor
>         Attachments: scenario-alloc-hs26.html
>
>
> I've looked at a lot of allocation profiles of RegionServers doing a 
> read-heavy workload. Some allocations that dominate the chart can be easily 
> avoided.
> The following code in the main decode method
> {code:java}
> currentBuffer.asSubByteBuffer(currentBuffer.position(), current.keyLength, 
> tmpPair);
> ByteBuffer key = tmpPair.getFirst().duplicate();
> key.position(tmpPair.getSecond()).limit(tmpPair.getSecond() + 
> current.keyLength);
> current.keyBuffer = key; {code}
> results in a new ByteBuffer for every cell. The reason to have this duplicate 
> ByteBuffer is to hold the result of {{tmpPair.getSecond()}} as its 
> {{position}} state. But this in just an integer that can be more cheaply 
> stored in a different way. We can introduce a {{current.keyOffset}} variable 
> and do this instead:
> {code:java}
> currentBuffer.asSubByteBuffer(currentBuffer.position(), current.keyLength, 
> tmpPair);
> current.keyBuffer = tmpPair.getFirst();
> current.keyOffset = tmpPair.getSecond();{code}
> and then reference {{current.keyOffset}} where we previously referenced 
> {{{}current.keyBuffer.position(){}}}. Based 
>  
> Additionally, {{RowIndexSeekerV1.SeekerState}} contains a 
> {{ByteBufferKeyOnlyKeyValue}} field that is replaced on every cell read. This 
> object can be reset and re-used instead.
>  
> On the attached profile, allocations of the duplicate {{ByteBuffers}} and 
> {{{}ByteBufferKeyOnlyKeyValue{}}}s collectively account for 35% of 
> allocations profiled. This is probably representative of the behavior of a 
> typical RegionServer doing a heavy amount of scans while using RowIndexV1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HBASE-29252) Reduce allocations in RowIndexSeekerV1

Reply via email to