[
https://issues.apache.org/jira/browse/HBASE-29252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Charles Connell updated HBASE-29252:
------------------------------------
Description:
I've looked at a lot of allocation profiles of RegionServers doing a read-heavy
workload. Some allocations that dominate the chart can be easily avoided.
The following code in the main decode method
{code:java}
currentBuffer.asSubByteBuffer(currentBuffer.position(), current.keyLength,
tmpPair);
ByteBuffer key = tmpPair.getFirst().duplicate();
key.position(tmpPair.getSecond()).limit(tmpPair.getSecond() +
current.keyLength);
current.keyBuffer = key; {code}
results in a new ByteBuffer for every cell. The reason to have this duplicate
ByteBuffer is to hold the result of {{tmpPair.getSecond()}} as its {{position}}
state. But this in just an integer that can be more cheaply stored in a
different way. We can introduce a {{current.keyOffset}} variable and do this
instead:
{code:java}
currentBuffer.asSubByteBuffer(currentBuffer.position(), current.keyLength,
tmpPair);
current.keyBuffer = tmpPair.getFirst();
current.keyOffset = tmpPair.getSecond();{code}
and then reference {{current.keyOffset}} where we previously referenced
{{{}current.keyBuffer.position(){}}}. Based
Additionally, {{RowIndexSeekerV1.SeekerState}} contains a
{{ByteBufferKeyOnlyKeyValue}} field that is replaced on every cell read. This
object can be reset and re-used instead.
On the attached profile, allocations of the duplicate {{ByteBuffers}} and
{{{}ByteBufferKeyOnlyKeyValue{}}}s collectively account for 35% of allocations
profiled. This is probably representative of the behavior of a typical
RegionServer doing a heavy amount of scans while using RowIndexV1.
was:
I've looked at a lot of allocation profiles of RegionServers doing a read-heavy
workload. Some allocations that dominate the chart can be easily avoided.
* {{RowIndexSeekerV1.SeekerState}} contains a {{ByteBufferKeyOnlyKeyValue}}
field that is replaced on every cell read. This object can be reset and re-used
instead. On the attached profile, allocations of this object account for 9% of
the allocations done. This is probably representative of the behavior of a
typical RegionServer doing a heavy amount of scans while using RowIndexV1.
* The following code in the main decode method
{code:java}
currentBuffer.asSubByteBuffer(currentBuffer.position(), current.keyLength,
tmpPair);
ByteBuffer key = tmpPair.getFirst().duplicate();
key.position(tmpPair.getSecond()).limit(tmpPair.getSecond() +
current.keyLength);
current.keyBuffer = key; {code}
results in a new ByteBuffer for every cell. The reason to have this duplicate
ByteBuffer is to hold the result of {{tmpPair.getSecond()}} as its {{position}}
state. But this in just an integer that can be more cheaply stored in a
different way. We can introduce a {{current.keyOffset}} variable and do this
instead:
{code:java}
currentBuffer.asSubByteBuffer(currentBuffer.position(), current.keyLength,
tmpPair);
current.keyBuffer = tmpPair.getFirst();
current.keyOffset = tmpPair.getSecond();{code}
and then reference {{current.keyOffset}} where we previously referenced
{{{}current.keyBuffer.position(){}}}.
> Reduce allocations in RowIndexSeekerV1
> --------------------------------------
>
> Key: HBASE-29252
> URL: https://issues.apache.org/jira/browse/HBASE-29252
> Project: HBase
> Issue Type: Improvement
> Reporter: Charles Connell
> Assignee: Charles Connell
> Priority: Minor
> Attachments: scenario-alloc-hs26.html
>
>
> I've looked at a lot of allocation profiles of RegionServers doing a
> read-heavy workload. Some allocations that dominate the chart can be easily
> avoided.
> The following code in the main decode method
> {code:java}
> currentBuffer.asSubByteBuffer(currentBuffer.position(), current.keyLength,
> tmpPair);
> ByteBuffer key = tmpPair.getFirst().duplicate();
> key.position(tmpPair.getSecond()).limit(tmpPair.getSecond() +
> current.keyLength);
> current.keyBuffer = key; {code}
> results in a new ByteBuffer for every cell. The reason to have this duplicate
> ByteBuffer is to hold the result of {{tmpPair.getSecond()}} as its
> {{position}} state. But this in just an integer that can be more cheaply
> stored in a different way. We can introduce a {{current.keyOffset}} variable
> and do this instead:
> {code:java}
> currentBuffer.asSubByteBuffer(currentBuffer.position(), current.keyLength,
> tmpPair);
> current.keyBuffer = tmpPair.getFirst();
> current.keyOffset = tmpPair.getSecond();{code}
> and then reference {{current.keyOffset}} where we previously referenced
> {{{}current.keyBuffer.position(){}}}. Based
>
> Additionally, {{RowIndexSeekerV1.SeekerState}} contains a
> {{ByteBufferKeyOnlyKeyValue}} field that is replaced on every cell read. This
> object can be reset and re-used instead.
>
> On the attached profile, allocations of the duplicate {{ByteBuffers}} and
> {{{}ByteBufferKeyOnlyKeyValue{}}}s collectively account for 35% of
> allocations profiled. This is probably representative of the behavior of a
> typical RegionServer doing a heavy amount of scans while using RowIndexV1.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)