Re: [PR] HBASE-28904 Supports enabling storage policy in the data copying scenario of bulkload [hbase]

via GitHub Tue, 15 Oct 2024 22:21:37 -0700


2005hithlj commented on PR #6347:
URL: https://github.com/apache/hbase/pull/6347#issuecomment-2415757968


   @NihalJain Thanks for your review.
   The Bulkload process consists of two steps:
   
   1. generate hfiles using MR/SPARK and write them to an HDFS cluster.
   2. execute 'hbase completebulkload [OPTIONS] 
</PATH/TO/HFILEOUTPUTFORMAT-OUTPUT> <TABLENAME> ' or invoke the 
BulkLoadHFilesTool API.
   
   [HBASE-1721](https://issues.apache.org/jira/browse/HBASE-15172) implements 
tiered storage capabilities for bulkload, but it is only applicable to 
scenarios where hfiles generated by MR/SPARK are directly written to the HDFS 
cluster used by HBase (tiered storage is configured). However, in most bulkload 
scenarios, hfiles generated by MR/SPARK are first written to an offline HDFS 
cluster (non-HBase HDFS Cluster, and tiered storage is not configured). 
Subsequently, the 'hbase completebulkload' command is used to copy these hfiles 
from the offline HDFS cluster to the HDFS cluster used by HBase, and rename 
them to the appropriate table/partition/columnfamily directory. This scenario 
is not supported by 
[HBASE-1721](https://issues.apache.org/jira/browse/HBASE-15172), this issue 
will support tiered storage for this more general bulkload scenario.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] HBASE-28904 Supports enabling storage policy in the data copying scenario of bulkload [hbase]

Reply via email to