[ 
https://issues.apache.org/jira/browse/HBASE-29604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-29604:
---------------------------------
    Fix Version/s: 2.7.0
                   2.6.4

> BackupHFileCleaner uses flawed time based check
> -----------------------------------------------
>
>                 Key: HBASE-29604
>                 URL: https://issues.apache.org/jira/browse/HBASE-29604
>             Project: HBase
>          Issue Type: Bug
>          Components: backup&restore
>    Affects Versions: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1
>            Reporter: Dieter De Paepe
>            Assignee: Dieter De Paepe
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2, 2.6.4
>
>
> BackupHFileCleaner is responsible for preventing the cleanup of bulkloaded 
> HFiles that are still required by the backup & restore mechanism. It does 
> this using 2 checks:
>  * The backupsystemtable stores which HFile bulk loads are required for the 
> next incremental backup. Any HFile present here cannot be deleted.
>  * A time-based check is present to avoid recently created HFiles from being 
> deleted. The intention is to avoid deletion of HFiles newer than the previous 
> run of the cleaner. I believe is to avoid race conditions between the cleaner 
> and entries in the backupsystemtable that get created while the cleaner is 
> running.
> In a single-threaded context, this works correctly.
> However, the cleaner is actually used concurrently in the 
> hfile_cleaner-dir-scan-pool to scan multiple subdirectories in 
> `CleanerChore#traverseAndDelete` (line 492). This means the time-based check 
> is not guaranteed to protect recently created HFiles. This has a (small) 
> chance to cause data loss (in a backup) if an HFile is wrongfully deleted.
> I also strongly suggest to add a mention to FileCleanerDelegate that 
> implementations should be thread-safe.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to