[
https://issues.apache.org/jira/browse/HBASE-29784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18049099#comment-18049099
]
Bram Schuur commented on HBASE-29784:
-------------------------------------
It turned out a coprocessor got confused due to a major compaction producing a
DeleteFamiliyVersion in its scan when the sequence point was newer than
`store.getSmallestReadPoint()` during a compaction.
This behavior is unexpected and undocumented (we were expecting no delete
markers in the major compaction) bu not strictly a bug.
Closing the issue.
> DeleteFamilyVersion is not effectuated even though it is committed to WAL
> -------------------------------------------------------------------------
>
> Key: HBASE-29784
> URL: https://issues.apache.org/jira/browse/HBASE-29784
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 2.6.3
> Environment: JDK: 21.0.9
> HBase: 2.6.3
> Hadoop: 3.4.2
> Arch: x86
> OS: Containerized linux
> Reporter: Bram Schuur
> Priority: Critical
>
> We are running hbase 2.6.3 as a datastore, sometimes we wipe data through
> DeleteFamilyVersion. Every now and then (intermittent, non-deterministic),
> the hbase database somehow forgets about a 'DeleteFamilyVersion' that we
> emitted for a row, making the data we meant to erase to appear again.
> We started capturing more extensive WAL logs for our regions, which shows the
> DeleteFamilyVersion we emit is committed to WAL, however the data is still
> visible through the api after flushing/compaction of the region. There are no
> errors in the logs.
> Below a snippet of the data we traced:
> Data as queried from the hbase api:
> {code}
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:\x00/1765693071241000000/Put/vlen=1/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:description/1765693071241000000/Put/vlen=4/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:domainIdentifier/1765693071241000000/Put/vlen=60/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:domainName/1765693071241000000/Put/vlen=8/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:identifiers/1765693071241000000/Put/vlen=94/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:lastUpdateTimestamp/1765693071241000000/Put/vlen=8/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:layerIdentifier/1765693071241000000/Put/vlen=39/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:layerName/1765693071241000000/Put/vlen=12/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:name/1765693071241000000/Put/vlen=26/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:order/1765693071241000000/Put/vlen=10/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:properties/1765693071241000000/Put/vlen=208/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:tags/1765693071241000000/Put/vlen=184/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:typeIdentifier/1765693071241000000/Put/vlen=72/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:typeName/1765693071241000000/Put/vlen=10/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:~\x00/1765693071241000000/Put/vlen=11/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:~e\x06SYNCED\x00\x00\xDA\xB7\xE2\xD3\x8FW/1765693071241000000/Put/vlen=16/seqid=0
> {code}
> Data in captured WAL:
> {code}
> ...
> Sequence=10628094, table=sg__default__vertices,
> region=834ed0ff02e8d7d42b88ad5666a4b1e8, at write timestamp=Sun Dec 14
> 06:17:51 UTC 2025
> ...
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:~\x00,
> timestamp=1765693071241000000, type=Put
> value: \x03\x01Componen\xF4
> cell total size sum: 96
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:domainIdentifier,
> timestamp=1765693071241000000, type=Put
> value:
> \x02\x03\x01urn:stackpack:stackstate-k8s-agent-v2:shared:domain:agen\xF4
> cell total size sum: 160
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:identifiers,
> timestamp=1765693071241000000, type=Put
> value:
> \x02!\x01\x01\x03\x01\xD7\x01urn:process:/i-06fb48dc80ed9944b-preprod-dev.preprod.stackstate.io:68116:1765692998000
> cell total size sum: 184
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:domainName,
> timestamp=1765693071241000000, type=Put
> value: \x02\x03\x01Agen\xF4
> cell total size sum: 96
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:typeName,
> timestamp=1765693071241000000, type=Put
> value: \x02\x03\x01proces\xF3
> cell total size sum: 96
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:name,
> timestamp=1765693071241000000, type=Put
> value: \x02\x03\x01containerd-shim-runc-v\xB2
> cell total size sum: 112
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:description,
> timestamp=1765693071241000000, type=Put
> value: \x02\x03\x01\x81
> cell total size sum: 96
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:typeIdentifier,
> timestamp=1765693071241000000, type=Put
> value:
> \x02\x03\x01\xC4\x01urn:stackpack:stackstate-k8s-agent-v2:shared:component-type:process
> cell total size sum: 168
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:layerIdentifier,
> timestamp=1765693071241000000, type=Put
> value: \x02\x03\x01urn:stackpack:common:layer:processe\xF3
> cell total size sum: 136
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:layerName,
> timestamp=1765693071241000000, type=Put
> value: \x02\x03\x01Processe\xF3
> cell total size sum: 104
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:properties,
> timestamp=1765693071241000000, type=Put
> value: \x02
> \x01\x04\x03\x01hos\xF4\x03\x01i-06fb48dc80ed9944b-preprod-dev.preprod.stackstate.i\xEF\x03\x01external_i\xE4\x03\x01\xD7\x01urn:process:/i-06fb48dc80ed9944b-preprod-dev.preprod.stackstate.io:68116:1765692998000\x03\x01pi\xE4\x03\x016811\xB6\x03\x01create_tim\xE5\x03\x01176569299800\xB0
> cell total size sum: 296
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:tags,
> timestamp=1765693071241000000, type=Put
> value:
> \x02!\x01\x07\x03\x01host:i-06fb48dc80ed9944b-preprod-dev.preprod.stackstate.i\xEF\x03\x01stackpack:agen\xF4\x03\x01pid:6811\xB6\x03\x01user:roo\xF4\x03\x01os:linu\xF8\x03\x01command:/usr/bin/containerd-shim-runc-v\xB2\x03\x01process_category:executabl\xE5
> cell total size sum: 272
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:order,
> timestamp=1765693071241000000, type=Put
> value: \x02\x0A\x00\x00\x00\x00\x00\x00\x00\x00
> cell total size sum: 96
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:lastUpdateTimestamp,
> timestamp=1765693071241000000, type=Put
> value: \x02\x09\x92\xFE\x90\xB8\xE3f
> cell total size sum: 112
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA,
> column=cf:~e\x06SYNCED\x00\x00\xDA\xB7\xE2\xD3\x8FW,
> timestamp=1765693071241000000, type=Put
> value: \x01\x06Synced\x00\x00x\x87\xBA\xDE\xF7\xFE
> cell total size sum: 112
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:\x00,
> timestamp=1765693071241000000, type=Put
> value: \x01
> cell total size sum: 80
> ...
> position: 1481623
> ...
> Sequence=10628100, table=sg__default__vertices,
> region=834ed0ff02e8d7d42b88ad5666a4b1e8, at write timestamp=Sun Dec 14
> 06:17:51 UTC 2025
> ...
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:, timestamp=1765693071241000000,
> type=DeleteFamilyVersion
> value:
> cell total size sum: 80
> ...
> position: 1531651
> {code}
> What could be the cause? I check the bugtracker but found nothing
> resembling/matching our symptoms.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)