[
https://issues.apache.org/jira/browse/HBASE-29521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18017521#comment-18017521
]
Vinayak Hegde commented on HBASE-29521:
---------------------------------------
Thanks [~taklwu]
{quote}why do we think writing into backup table is not good in long run ?
{quote}
In this case, we’ll have the actual data in the backup location and the
metadata in the backup table (in the source cluster). That means we’ll need to
ensure consistency between the backup data and the metadata, handle cleanup
when files are deleted in the backup location, etc.
{quote}when should we store all bulkload entries (with timestamps) in the
backup system table ? how does it link to the WAL file/WAL edits?
{quote}
That’s the tricky part. Currently, we store these bulkload entries in the
system table for incremental backup - specifically when we’re about to commit
the bulkloaded file
([https://github.com/apache/hbase/blob/master/hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/BackupObserver.java|https://github.com/apache/hbase/blob/master/hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/BackupObserver.java?utm_source=chatgpt.com]).
We could probably follow the same here. For the timestamp, we’ll need to
decide which one to use. In the WAL scanning approach, we used WAL edit
timestamps - if the edit falls between _st_ and {_}et{_}, we consider the
bulkload entry valid.
As for the WAL/WAL edits link - there won’t really be a direct link, right?
Since WALs are replayed separately and bulkload entries are handled separately,
the only condition is that both follow the same start ({_}st{_}) and end
({_}et{_}) times.
Regarding the number of entries, I’m not certain, but since we already stored
them in incremental backup, it should be manageable.
Overall, I think the MR job would still be a simpler approach. What do you
think?
> Update Restore Command to Handle Bulkloaded Files
> -------------------------------------------------
>
> Key: HBASE-29521
> URL: https://issues.apache.org/jira/browse/HBASE-29521
> Project: HBase
> Issue Type: Sub-task
> Components: backup&restore
> Reporter: Vinayak Hegde
> Assignee: Vinayak Hegde
> Priority: Major
>
> Enhance the restore command to replay WAL edits first, then bulkload HFiles
> from the backup location. Ensure PITR restore correctness and handle cases
> where bulkloaded files are referenced in WALs. Validate the presence of all
> required files before restore execution.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)