[ 
https://issues.apache.org/jira/browse/HBASE-29521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18016590#comment-18016590
 ] 

Tak-Lon (Stephen) Wu commented on HBASE-29521:
----------------------------------------------

IMO having the system table approach or write/append a metadata file about the 
bulkload filepaths should be better than rescanning, e.g. having a query 
instead of scanning again would speed up a lot especially when you backup to 
cloud storage.


I have few questions below 

1.  why do we think writing into backup table is not good in long run ?
2. when should we store all bulkload entries (with timestamps) in the backup 
system table ? how does it link to the WAL file/WAL edits? 
3. do you expect a large amount of entries ?
4. how does this query looks like? can you share an relational example?


personally, I felt like writing a MR job is also simple, but having this backup 
table/metadate file may be a bigger task.

> Update Restore Command to Handle Bulkloaded Files
> -------------------------------------------------
>
>                 Key: HBASE-29521
>                 URL: https://issues.apache.org/jira/browse/HBASE-29521
>             Project: HBase
>          Issue Type: Sub-task
>          Components: backup&restore
>            Reporter: Vinayak Hegde
>            Assignee: Vinayak Hegde
>            Priority: Major
>
> Enhance the restore command to replay WAL edits first, then bulkload HFiles 
> from the backup location. Ensure PITR restore correctness and handle cases 
> where bulkloaded files are referenced in WALs. Validate the presence of all 
> required files before restore execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to