[DISCUSS] Integration of Baidu Cloud BOS (Baidu Object Storage) in Hadoop

Yang,Dongdong(ACG CCN) Wed, 25 Mar 2026 23:57:59 -0700

Hi all,
I would like to propose integrating Baidu Cloud BOS (Baidu Object Storage) as a 
native Hadoop-compatible filesystem module, similar to the existing hadoop-aws 
(S3A) , hadoop-aliyun, and hadoop-cos connectors.


JIRA: https://issues.apache.org/jira/browse/HDFS-11161
PR: https://github.com/apache/hadoop/pull/8347
CI Status: +1 overall (all checks passed)

1. Background
Baidu Cloud is one of the major cloud service providers in China. BOS (Baidu 
Object Storage) is Baidu's core object storage service, widely used for big 
data analytics, machine learning, and data lake workloads. A native Hadoop 
connector enables Hadoop ecosystem tools (MapReduce, Spark, Hive, Flink, etc.) 
to directly access BOS storage via the bos:// scheme.
This connector has been running in production at Baidu for 8 years, serving 
both BOS users and Baidu MapReduce (BMR).

2. Implementation
The module is placed under hadoop-cloud-storage-project/hadoop-bos, consistent 
with existing cloud connectors. Key features:

  *
Full FileSystem implementation with bos:// URI scheme
  *
Pluggable credentials provider
  *
Contract tests covering all standard filesystem operations
  *
All dependencies are either excluded or shaded under 
org.apache.hadoop.fs.bos.shaded.* to avoid classpath conflicts


3. Long-term Maintenance
The following contributors are committed to maintaining this module:

  *
yangdong2398(BOS R&D)
  *
LuciferYang(PMC of Spark)
  *
jackylee-ch(PMC of Gluten)
  *
houzhizhen(committer of hugegraph)
  *
summaryzb(committer of Uniffle)

We commit to:

  *
Responding to issues and PRs within 1 week
  *
Keeping dependencies up to date
  *
Adapting to Hadoop API changes in future releases


4. Why Integrate into Hadoop
Following the same rationale as hadoop-aws (S3A), hadoop-aliyun, and hadoop-cos:

  *
Users benefit from a single, consistent distribution — no need to manage 
separate connector JARs and version compatibility
  *
A connector maintained within the Hadoop community is more reliable and 
trustworthy
  *
Shared CI infrastructure ensures ongoing compatibility with Hadoop trunk


Looking forward to your feedback.
Best regards.

[DISCUSS] Integration of Baidu Cloud BOS (Baidu Object Storage) in Hadoop

Reply via email to