[
https://issues.apache.org/jira/browse/HBASE-29604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duo Zhang updated HBASE-29604:
------------------------------
Fix Version/s: (was: 4.0.0-alpha-1)
> BackupHFileCleaner uses flawed time based check
> -----------------------------------------------
>
> Key: HBASE-29604
> URL: https://issues.apache.org/jira/browse/HBASE-29604
> Project: HBase
> Issue Type: Bug
> Components: backup&restore
> Affects Versions: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1
> Reporter: Dieter De Paepe
> Assignee: Dieter De Paepe
> Priority: Critical
> Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.4
>
>
> BackupHFileCleaner is responsible for preventing the cleanup of bulkloaded
> HFiles that are still required by the backup & restore mechanism. It does
> this using 2 checks:
> * The backupsystemtable stores which HFile bulk loads are required for the
> next incremental backup. Any HFile present here cannot be deleted.
> * A time-based check is present to avoid recently created HFiles from being
> deleted. The intention is to avoid deletion of HFiles newer than the previous
> run of the cleaner. I believe is to avoid race conditions between the cleaner
> and entries in the backupsystemtable that get created while the cleaner is
> running.
> In a single-threaded context, this works correctly.
> However, the cleaner is actually used concurrently in the
> hfile_cleaner-dir-scan-pool to scan multiple subdirectories in
> `CleanerChore#traverseAndDelete` (line 492). This means the time-based check
> is not guaranteed to protect recently created HFiles. This has a (small)
> chance to cause data loss (in a backup) if an HFile is wrongfully deleted.
> I also strongly suggest to add a mention to FileCleanerDelegate that
> implementations should be thread-safe.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)