[
https://issues.apache.org/jira/browse/HBASE-29521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18016590#comment-18016590
]
Tak-Lon (Stephen) Wu commented on HBASE-29521:
----------------------------------------------
IMO having the system table approach or write/append a metadata file about the
bulkload filepaths should be better than rescanning, e.g. having a query
instead of scanning again would speed up a lot especially when you backup to
cloud storage.
I have few questions below
1. why do we think writing into backup table is not good in long run ?
2. when should we store all bulkload entries (with timestamps) in the backup
system table ? how does it link to the WAL file/WAL edits?
3. do you expect a large amount of entries ?
4. how does this query looks like? can you share an relational example?
personally, I felt like writing a MR job is also simple, but having this backup
table/metadate file may be a bigger task.
> Update Restore Command to Handle Bulkloaded Files
> -------------------------------------------------
>
> Key: HBASE-29521
> URL: https://issues.apache.org/jira/browse/HBASE-29521
> Project: HBase
> Issue Type: Sub-task
> Components: backup&restore
> Reporter: Vinayak Hegde
> Assignee: Vinayak Hegde
> Priority: Major
>
> Enhance the restore command to replay WAL edits first, then bulkload HFiles
> from the backup location. Ensure PITR restore correctness and handle cases
> where bulkloaded files are referenced in WALs. Validate the presence of all
> required files before restore execution.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)