[jira] [Commented] (HBASE-29521) Update Restore Command to Handle Bulkloaded Files

Vinayak Hegde (Jira) Tue, 02 Sep 2025 20:48:13 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-29521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18017521#comment-18017521
 ]


Vinayak Hegde commented on HBASE-29521:
---------------------------------------

Thanks [~taklwu] 
{quote}why do we think writing into backup table is not good in long run ?
{quote}
In this case, we’ll have the actual data in the backup location and the 
metadata in the backup table (in the source cluster). That means we’ll need to 
ensure consistency between the backup data and the metadata, handle cleanup 
when files are deleted in the backup location, etc.
{quote}when should we store all bulkload entries (with timestamps) in the 
backup system table ? how does it link to the WAL file/WAL edits?
{quote}
That’s the tricky part. Currently, we store these bulkload entries in the 
system table for incremental backup - specifically when we’re about to commit 
the bulkloaded file 
([https://github.com/apache/hbase/blob/master/hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/BackupObserver.java|https://github.com/apache/hbase/blob/master/hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/BackupObserver.java?utm_source=chatgpt.com]).
 We could probably follow the same here. For the timestamp, we’ll need to 
decide which one to use. In the WAL scanning approach, we used WAL edit 
timestamps -  if the edit falls between _st_ and {_}et{_}, we consider the 
bulkload entry valid.

As for the WAL/WAL edits link - there won’t really be a direct link, right? 
Since WALs are replayed separately and bulkload entries are handled separately, 
the only condition is that both follow the same start ({_}st{_}) and end 
({_}et{_}) times.

 

Regarding the number of entries, I’m not certain, but since we already stored 
them in incremental backup, it should be manageable.

Overall, I think the MR job would still be a simpler approach. What do you 
think?

 

> Update Restore Command to Handle Bulkloaded Files
> -------------------------------------------------
>
>                 Key: HBASE-29521
>                 URL: https://issues.apache.org/jira/browse/HBASE-29521
>             Project: HBase
>          Issue Type: Sub-task
>          Components: backup&amp;restore
>            Reporter: Vinayak Hegde
>            Assignee: Vinayak Hegde
>            Priority: Major
>
> Enhance the restore command to replay WAL edits first, then bulkload HFiles 
> from the backup location. Ensure PITR restore correctness and handle cases 
> where bulkloaded files are referenced in WALs. Validate the presence of all 
> required files before restore execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-29521) Update Restore Command to Handle Bulkloaded Files

Reply via email to