[ 
https://issues.apache.org/jira/browse/HBASE-29784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18049099#comment-18049099
 ] 

Bram Schuur commented on HBASE-29784:
-------------------------------------

It turned out a coprocessor got confused due to a major compaction producing a 
DeleteFamiliyVersion in its scan when the sequence point was newer than 
`store.getSmallestReadPoint()` during a compaction.

This behavior is unexpected and undocumented (we were expecting no delete 
markers in the major compaction) bu not strictly a bug.

Closing the issue.

> DeleteFamilyVersion is not effectuated even though it is committed to WAL
> -------------------------------------------------------------------------
>
>                 Key: HBASE-29784
>                 URL: https://issues.apache.org/jira/browse/HBASE-29784
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 2.6.3
>         Environment: JDK: 21.0.9
> HBase: 2.6.3
> Hadoop: 3.4.2
> Arch: x86
> OS: Containerized linux
>            Reporter: Bram Schuur
>            Priority: Critical
>
> We are running hbase 2.6.3 as a datastore, sometimes we wipe data through 
> DeleteFamilyVersion. Every now and then (intermittent, non-deterministic), 
> the hbase database somehow forgets about a 'DeleteFamilyVersion' that we 
> emitted for a row, making the data we meant to erase to appear again.
> We started capturing more extensive WAL logs for our regions, which shows the 
> DeleteFamilyVersion we emit is committed to WAL, however the data is still 
> visible through the api after flushing/compaction of the region. There are no 
> errors in the logs.
> Below a snippet of the data we traced:
> Data as queried from the hbase api:
> {code}
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:\x00/1765693071241000000/Put/vlen=1/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:description/1765693071241000000/Put/vlen=4/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:domainIdentifier/1765693071241000000/Put/vlen=60/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:domainName/1765693071241000000/Put/vlen=8/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:identifiers/1765693071241000000/Put/vlen=94/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:lastUpdateTimestamp/1765693071241000000/Put/vlen=8/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:layerIdentifier/1765693071241000000/Put/vlen=39/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:layerName/1765693071241000000/Put/vlen=12/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:name/1765693071241000000/Put/vlen=26/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:order/1765693071241000000/Put/vlen=10/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:properties/1765693071241000000/Put/vlen=208/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:tags/1765693071241000000/Put/vlen=184/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:typeIdentifier/1765693071241000000/Put/vlen=72/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:typeName/1765693071241000000/Put/vlen=10/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:~\x00/1765693071241000000/Put/vlen=11/seqid=0
> \x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:~e\x06SYNCED\x00\x00\xDA\xB7\xE2\xD3\x8FW/1765693071241000000/Put/vlen=16/seqid=0
> {code}
> Data in captured WAL:
> {code}
> ...
> Sequence=10628094, table=sg__default__vertices, 
> region=834ed0ff02e8d7d42b88ad5666a4b1e8, at write timestamp=Sun Dec 14 
> 06:17:51 UTC 2025
> ...
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:~\x00, 
> timestamp=1765693071241000000, type=Put
>     value: \x03\x01Componen\xF4
> cell total size sum: 96
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:domainIdentifier, 
> timestamp=1765693071241000000, type=Put
>     value: 
> \x02\x03\x01urn:stackpack:stackstate-k8s-agent-v2:shared:domain:agen\xF4
> cell total size sum: 160
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:identifiers, 
> timestamp=1765693071241000000, type=Put
>     value: 
> \x02!\x01\x01\x03\x01\xD7\x01urn:process:/i-06fb48dc80ed9944b-preprod-dev.preprod.stackstate.io:68116:1765692998000
> cell total size sum: 184
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:domainName, 
> timestamp=1765693071241000000, type=Put
>     value: \x02\x03\x01Agen\xF4
> cell total size sum: 96
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:typeName, 
> timestamp=1765693071241000000, type=Put
>     value: \x02\x03\x01proces\xF3
> cell total size sum: 96
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:name, 
> timestamp=1765693071241000000, type=Put
>     value: \x02\x03\x01containerd-shim-runc-v\xB2
> cell total size sum: 112
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:description, 
> timestamp=1765693071241000000, type=Put
>     value: \x02\x03\x01\x81
> cell total size sum: 96
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:typeIdentifier, 
> timestamp=1765693071241000000, type=Put
>     value: 
> \x02\x03\x01\xC4\x01urn:stackpack:stackstate-k8s-agent-v2:shared:component-type:process
> cell total size sum: 168
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:layerIdentifier, 
> timestamp=1765693071241000000, type=Put
>     value: \x02\x03\x01urn:stackpack:common:layer:processe\xF3
> cell total size sum: 136
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:layerName, 
> timestamp=1765693071241000000, type=Put
>     value: \x02\x03\x01Processe\xF3
> cell total size sum: 104
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:properties, 
> timestamp=1765693071241000000, type=Put
>     value: \x02 
> \x01\x04\x03\x01hos\xF4\x03\x01i-06fb48dc80ed9944b-preprod-dev.preprod.stackstate.i\xEF\x03\x01external_i\xE4\x03\x01\xD7\x01urn:process:/i-06fb48dc80ed9944b-preprod-dev.preprod.stackstate.io:68116:1765692998000\x03\x01pi\xE4\x03\x016811\xB6\x03\x01create_tim\xE5\x03\x01176569299800\xB0
> cell total size sum: 296
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:tags, 
> timestamp=1765693071241000000, type=Put
>     value: 
> \x02!\x01\x07\x03\x01host:i-06fb48dc80ed9944b-preprod-dev.preprod.stackstate.i\xEF\x03\x01stackpack:agen\xF4\x03\x01pid:6811\xB6\x03\x01user:roo\xF4\x03\x01os:linu\xF8\x03\x01command:/usr/bin/containerd-shim-runc-v\xB2\x03\x01process_category:executabl\xE5
> cell total size sum: 272
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:order, 
> timestamp=1765693071241000000, type=Put
>     value: \x02\x0A\x00\x00\x00\x00\x00\x00\x00\x00
> cell total size sum: 96
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:lastUpdateTimestamp, 
> timestamp=1765693071241000000, type=Put
>     value: \x02\x09\x92\xFE\x90\xB8\xE3f
> cell total size sum: 112
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, 
> column=cf:~e\x06SYNCED\x00\x00\xDA\xB7\xE2\xD3\x8FW, 
> timestamp=1765693071241000000, type=Put
>     value: \x01\x06Synced\x00\x00x\x87\xBA\xDE\xF7\xFE
> cell total size sum: 112
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:\x00, 
> timestamp=1765693071241000000, type=Put
>     value: \x01
> cell total size sum: 80
> ...
> position: 1481623
> ...
> Sequence=10628100, table=sg__default__vertices, 
> region=834ed0ff02e8d7d42b88ad5666a4b1e8, at write timestamp=Sun Dec 14 
> 06:17:51 UTC 2025
> ...
> row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:, timestamp=1765693071241000000, 
> type=DeleteFamilyVersion
>     value: 
> cell total size sum: 80
> ...
> position: 1531651
> {code}
> What could be the cause? I check the bugtracker but found nothing 
> resembling/matching our symptoms. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to