Dieter De Paepe created HBASE-29003: ---------------------------------------
Summary: Proper bulk load tracking Key: HBASE-29003 URL: https://issues.apache.org/jira/browse/HBASE-29003 Project: HBase Issue Type: Bug Components: backup&restore Affects Versions: 2.6.0, 3.0.0, 4.0.0-alpha-1 Reporter: Dieter De Paepe As part of the incremental backup mechanism, HBase tracks which files were bulk-loaded (since the last backup). This data is stored in the backup:system_bulk table. Entries are added when a bulk load occurs through the BackupObserver co-processor. Entries are deleted when an incremental backup is completed. There are 2 flaws in this implementation: 1) Performing a full backup should clear the list. Imagine following scenario: * Create a full backup B1 of table T. * Perform a bulk load L1. * Take a full backup B2 of table T. * Take an incremental backup of table T. ** The data stored for this backup will include L1, even though that data is already present due to B2. (This is an inefficiency, not a real error.) 2) Performing a table deletion should clear the list of bulk-loaded files. Imagine the following scenario: * Create a full backup of table T. * Perform a bulk-load B1 into T. * Disable, delete and recreate T. * Create an incremental backup (taking a full backup instead is similar to the previous case) ** The backup will contain B1, even though it doesn't belong there. Note that this *can also cause backup corruption* after a backup restores (which is how we encountered this issue), which makes this problem less niche than the above scenarios indicate. Backup restore effectively uses bulk loads as well, so users could run into following scenario, where they are trying to restore data corruption: * (create an environment with backup B1 (time t), backup B2 (time t2 > t). * Users notice data corruption, and restore backup B2 after clearing the table * Users notice data corruption is already present, and restore backup B1 after clearing the table. * Users find data corruption solved, and resume regular backup cycle from here on. ** Any incremental backup taken will contain the (possible corrupt) data from B2 (due to the restore operation using bulk operations). The backups will be affected until a FULL backup is taken after an incremental backup (so this could span a period of weeks assuming bi-weekly/monthly full backups). A minimal reproduction example: {code:java} echo "create 'table', 'cf'; put 'table', 'row1', 'cf:a', 'value1', 1400523142819" | bin/hbase shell -n bin/hbase backup create full file:/tmp/backup -t table -i echo "disable 'table'; drop 'table'" | bin/hbase shell -n # Empty echo "scan 'backup:system_bulk'" | bin/hbase shell -n bin/hbase restore file:/tmp/backup backup_1732787972748 -t "table" # 1 entry echo "scan 'backup:system_bulk'" | bin/hbase shell -n echo "disable 'table'; drop 'table'" | bin/hbase shell -n # 1 entry echo "scan 'backup:system_bulk'" | bin/hbase shell -necho "create 'table', 'cf'; put 'table', 'row1', 'cf:b', 'value2', 1400523142819" | bin/hbase shell -n bin/hbase backup create full file:/tmp/backup -t table -i echo "scan 'backup:system_bulk'" | bin/hbase shell -n echo "put 'table', 'row1', 'cf:b', 'value3', 1400523142819" | bin/hbase shell -n bin/hbase backup create incremental file:/tmp/backup -t table -i # Emtpy echo "scan 'backup:system_bulk'" | bin/hbase shell -n echo "disable 'table'; drop 'table'" | bin/hbase shell -n bin/hbase restore file:/tmp/backup backup_1732788098586 -t "table" # Will contain "value1" (unexpected) and "value3" (expected) echo "scan 'table'" | bin/hbase shell -n {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)