2005hithlj commented on PR #6347: URL: https://github.com/apache/hbase/pull/6347#issuecomment-2415757968
@NihalJain Thanks for your review. The Bulkload process consists of two steps: 1. generate hfiles using MR/SPARK and write them to an HDFS cluster. 2. execute 'hbase completebulkload [OPTIONS] </PATH/TO/HFILEOUTPUTFORMAT-OUTPUT> <TABLENAME> ' or invoke the BulkLoadHFilesTool API. [HBASE-1721](https://issues.apache.org/jira/browse/HBASE-15172) implements tiered storage capabilities for bulkload, but it is only applicable to scenarios where hfiles generated by MR/SPARK are directly written to the HDFS cluster used by HBase (tiered storage is configured). However, in most bulkload scenarios, hfiles generated by MR/SPARK are first written to an offline HDFS cluster (non-HBase HDFS Cluster, and tiered storage is not configured). Subsequently, the 'hbase completebulkload' command is used to copy these hfiles from the offline HDFS cluster to the HDFS cluster used by HBase, and rename them to the appropriate table/partition/columnfamily directory. This scenario is not supported by [HBASE-1721](https://issues.apache.org/jira/browse/HBASE-15172), this issue will support tiered storage for this more general bulkload scenario. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org