[
https://issues.apache.org/jira/browse/HBASE-29521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18016307#comment-18016307
]
Vinayak Hegde edited comment on HBASE-29521 at 8/26/25 1:49 PM:
----------------------------------------------------------------
As part of PITR, in addition to restoring the full and incremental backups, we
also need to replay WAL edits for the remaining duration.
Example:
* st = timestamp of the last backup restored as part of PITR
* et = user-specified point-in-time for PITR
We need to replay WAL edits from st -> et, and also re-bulkload any bulkloaded
files that fall within this range.
Our current backup directory structure:
{code:java}
-- wal_backup_directory/
-- WALs/
-- 23-08-2025/
... wal files
-- 24-08-2025/
-- 25-08-2025/
-- bulk-load-files/
-- 23-08-2025/
... bulkload files
-- 24-08-2025/
-- 25-08-2025/
{code}
*requirement*
* Read WAL files from st → et, extract all bulkload file paths from the WAL
edits, and feed them into the existing RestoreJob (MR job) to bulkload them
into tables. RestoreJob logic is already implemented.
* The main challenge is to efficiently extract bulkload entries from WALs.
*Options*
New MR job
* Create a job to re-read the WALs (we already replay them with WALPlayer) and
collect all bulkload entries between st → et.
* Write them to a file, then feed that list into the existing RestoreJob.
* Downside: WALs are read twice.
System table tracking
* Store all bulkload entries (with timestamps) in the backup system table as
they occur.
* At PITR restore, simply query entries between st → et.
* Downside: introduces additional dependency on the system table, which we
would like to avoid in the long run.
What do you guys think? how should we handle this?
[~andor] [~swu] [~ssa] [~ankit.jhil]
was (Author: JIRAUSER298877):
As part of PITR, in addition to restoring the full and incremental backups, we
also need to replay WAL edits for the remaining duration.
Example:
* st = timestamp of the last backup restored as part of PITR
* et = user-specified point-in-time for PITR
We need to replay WAL edits from st -> et, and also re-bulkload any bulkloaded
files that fall within this range.
Our current backup directory structure:
{code:java}
-- wal_backup_directory/
-- WALs/
-- 23-08-2025/
... wal files
-- 24-08-2025/
-- 25-08-2025/
-- bulk-load-files/
-- 23-08-2025/
... bulkload files
-- 24-08-2025/
-- 25-08-2025/
{code}
*requirement*
* Read WAL files from st → et, extract all bulkload file paths from the WAL
edits, and feed them into the existing RestoreJob (MR job) to bulkload them
into tables. RestoreJob logic is already implemented.
* The main challenge is to efficiently extract bulkload entries from WALs.
*Options*
New MR job
Create a job to re-read the WALs (we already replay them with WALPlayer) and
collect all bulkload entries between st → et.
Write them to a file, then feed that list into the existing RestoreJob.
Downside: WALs are read twice.
System table tracking
Store all bulkload entries (with timestamps) in the backup system table as they
occur.
At PITR restore, simply query entries between st → et.
Downside: introduces additional dependency on the system table, which we would
like to avoid in the long run.
What do you guys think? how should we handle this?
[~andor] [~swu] [~ssa] [~ankit.jhil]
> Update Restore Command to Handle Bulkloaded Files
> -------------------------------------------------
>
> Key: HBASE-29521
> URL: https://issues.apache.org/jira/browse/HBASE-29521
> Project: HBase
> Issue Type: Sub-task
> Components: backup&restore
> Reporter: Vinayak Hegde
> Assignee: Vinayak Hegde
> Priority: Major
>
> Enhance the restore command to replay WAL edits first, then bulkload HFiles
> from the backup location. Ensure PITR restore correctness and handle cases
> where bulkloaded files are referenced in WALs. Validate the presence of all
> required files before restore execution.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)