Dieter De Paepe created HBASE-29800:
---------------------------------------

             Summary: WAL logs are unprotected during first full backup
                 Key: HBASE-29800
                 URL: https://issues.apache.org/jira/browse/HBASE-29800
             Project: HBase
          Issue Type: Bug
          Components: backup&restore
            Reporter: Dieter De Paepe


There is a small window during the creation of the first full backup in the 
first/only backup root where WAL logs might be eligible for deletion, which 
could lead to data loss for incremental backups in the following backups.

Pseudo code for this scenario is as follows (see FullTableBackupClient#execute):
{code:java}
// This is our first backup. Let's put some marker to system table so that we 
can hold the
// logs while we do the backup.
backupManager.writeBackupStartCode(0L);

// Roll the WALs
BackupUtils.logRoll(...);

snapshotAndCopyTables();

backupManager.writeBackupStartCode(newStartCode);

// Register the backupInfo as completed
completeBackup(...);{code}
The comment of the "0" backupStartCode suggests that it prevents WAL deletion 
until the backup is completed, but this is not the case.

The component responsible for preventing WAL deletion for backups is 
BackupLogCleaner. While the log cleaner does read & use the backup start codes, 
it only does so for backups that are already completed:
{code:java}
// true means only include completed backups
List<BackupInfo> backups = sysTable.getBackupHistory(true); {code}
So the log cleaner will not even be aware of the backup root.

I believe this means there is a risk of data loss in the following incremental 
backup when a table, after it has been snapshotted but before the backup is 
completed, performs a log roll and the log cleaner activates.

Simplest fix is probably to have the log cleaner also use in-progress 
backupInfos to calculate the startCode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to