[
https://issues.apache.org/jira/browse/HBASE-29521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18016307#comment-18016307
]
Vinayak Hegde edited comment on HBASE-29521 at 8/26/25 1:48 PM:
----------------------------------------------------------------
As part of PITR, in addition to restoring the full and incremental backups, we
also need to replay WAL edits for the remaining duration.
Example:
* st = timestamp of the last backup restored as part of PITR
* et = user-specified point-in-time for PITR
We need to replay WAL edits from st -> et, and also re-bulkload any bulkloaded
files that fall within this range.
Our current backup directory structure:
{code:java}
-- wal_backup_directory/
-- WALs/
-- 23-08-2025/
... wal files
-- 24-08-2025/
-- 25-08-2025/
-- bulk-load-files/
-- 23-08-2025/
... bulkload files
-- 24-08-2025/
-- 25-08-2025/
{code}
*requirement*
* Read WAL files from st → et, extract all bulkload file paths from the WAL
edits, and feed them into the existing RestoreJob (MR job) to bulkload them
into tables. RestoreJob logic is already implemented.
* The main challenge is to efficiently extract bulkload entries from WALs.
*Options*
New MR job
*
** Create a job to re-read the WALs (we already replay them with WALPlayer)
and collect all bulkload entries between st → et.
*
** Write them to a file, then feed that list into the existing RestoreJob.
*
** Downside: WALs are read twice.
System table tracking
*
** Store all bulkload entries (with timestamps) in the backup system table as
they occur.
*
** At PITR restore, simply query entries between st → et.
*
** Downside: introduces additional dependency on the system table, which we
would like to avoid in the long run.
What do you guys think? how should we handle this?
[~andor] [~swu] [~ssa] [~ankit.jhil]
was (Author: JIRAUSER298877):
As part of PITR, in addition to restoring the full and incremental backups, we
also need to replay WAL edits for the remaining duration.
Example:
* st = timestamp of the last backup restored as part of PITR
* et = user-specified point-in-time for PITR
We need to replay WAL edits from st -> et, and also re-bulkload any bulkloaded
files that fall within this range.
Our current backup directory structure:
{code:java}
-- wal_backup_directory/
-- WALs/
-- 23-08-2025/
... wal files
-- 24-08-2025/
-- 25-08-2025/
-- bulk-load-files/
-- 23-08-2025/
... bulkload files
-- 24-08-2025/
-- 25-08-2025/
{code}
*requirement*
* Read WAL files from st → et, extract all bulkload file paths from the WAL
edits, and feed them into the existing RestoreJob (MR job) to bulkload them
into tables. RestoreJob logic is already implemented.
* The main challenge is to efficiently extract bulkload entries from WALs.
*Options*
# New MR job
** Create a job to re-read the WALs (we already replay them with WALPlayer)
and collect all bulkload entries between st → et.
** Write them to a file, then feed that list into the existing RestoreJob.
** Downside: WALs are read twice.
# System table tracking
** Store all bulkload entries (with timestamps) in the backup system table as
they occur.
** At PITR restore, simply query entries between st → et.
** Downside: introduces additional dependency on the system table, which we
would like to avoid in the long run.
What do you guys think? how should we handle this?
[~andor] [~swu] [~ssa] [~ankit.jhil]
> Update Restore Command to Handle Bulkloaded Files
> -------------------------------------------------
>
> Key: HBASE-29521
> URL: https://issues.apache.org/jira/browse/HBASE-29521
> Project: HBase
> Issue Type: Sub-task
> Components: backup&restore
> Reporter: Vinayak Hegde
> Assignee: Vinayak Hegde
> Priority: Major
>
> Enhance the restore command to replay WAL edits first, then bulkload HFiles
> from the backup location. Ensure PITR restore correctness and handle cases
> where bulkloaded files are referenced in WALs. Validate the presence of all
> required files before restore execution.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)