[
https://issues.apache.org/jira/browse/HBASE-29890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HBASE-29890:
-----------------------------------
Labels: pull-request-available (was: )
> WAL tailing reader should resume partial cell reads instead of resetting
> compression
> ------------------------------------------------------------------------------------
>
> Key: HBASE-29890
> URL: https://issues.apache.org/jira/browse/HBASE-29890
> Project: HBase
> Issue Type: Improvement
> Components: Replication, wal
> Reporter: Sid Khillon
> Assignee: Sid Khillon
> Priority: Minor
> Labels: pull-request-available
>
> When the WAL tailing reader hits EOF mid-cell during WAL compression, it
> currently returns EOF_AND_RESET_COMPRESSION, which forces the reader to
> re-read the entire WAL file from the beginning to rebuild dictionary state.
> This is an O\(n\) operation that becomes increasingly expensive as the WAL
> grows.
> The root cause is that the CompressedKvDecoder eagerly adds entries to the
> compression dictionaries (ROW, FAMILY, QUALIFIER, and tag dictionaries) as it
> reads each field of a cell. If an IOException occurs partway through reading
> a cell, the dictionaries are left in a partially-updated state that no longer
> matches the actual stream position. The reader has no choice but to throw
> away the entire compression context and start over.
> Proposed Fix is to defer dictionary additions until a cell is fully parsed:
> - Buffer ROW/FAMILY/QUALIFIER dictionary additions in CompressedKvDecoder
> and only commit them after parseCellInner() succeeds. On IOException, discard
> the pending additions.
> - Add a similar deferred-addition mode to TagCompressionContext for tag
> dictionaries.
> - Reset the ValueCompressor if an IOException occurs during the value
> decompression phase.
> With deferred additions, hitting EOF mid-cell leaves the dictionaries in the
> state they were after the last fully-read cell. This means the reader can
> return EOF_AND_RESET (a cheap seek to the saved position) instead of
> EOF_AND_RESET_COMPRESSION, and resume reading from where it left off once the
> file grows.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)