Re: [I] Consider Using object_store as IO Abstraction [iceberg-rust]

via GitHub Wed, 07 May 2025 01:25:02 -0700


linhr commented on issue #172:
URL: https://github.com/apache/iceberg-rust/issues/172#issuecomment-2857585201


   It seems the discussion has been quiet for a while but I love the ideas here!
   
   I'm a maintainer of [Sail](https://github.com/lakehq/sail), a Rust library 
that offers a drop-in replacement for Apache Spark SQL and DataFrame APIs. One 
of the top features requested by our users is Iceberg support, and we would 
like to integrate with the official Iceberg Rust library.
   
   We use DataFusion and object_store at the core of Sail. I realized that this 
results in a blocker for the Iceberg integration, which uses the `FileIO` 
struct that wraps OpenDAL for data storage. We'd like to continue using the 
object_store abstraction for a few reasons:
   
   1. We'd like Iceberg to work out-of-box with the few custom `ObjectStore` 
implementations we have, or any custom storage features we implement in the 
future.
   2. We'd like to centralize storage configuration, without configuring e.g. 
S3 twice in both object_store and OpenDAL.
   3. We'd like to avoid increasing binary size due to existence of both 
object_store and OpenDAL dependencies.
   4. OpenDAL does not seem to support automatic credential rotation for S3. 
(Correct me if I'm wrong here.)
   5. I'm under the impression that OpenDAL does not use the official AWS SDK, 
so I'm not sure if some less used AWS credential providers (e.g. web identity 
tokens in containerized environments) work out-of-box.
   
   I saw that many people have shared good ideas why object_store could be a 
valuable addition to the Iceberg projects. I hope my points above can serve as 
concrete data points for the discussion. OpenDAL is a feature-rich project, and 
I assume the goal is not to replace OpenDAL with object_store, but figuring out 
a way so that both can co-exist. I feel this would drive the adoption of the 
Iceberg Rust library, attract more contribution to it, and make it 
battle-tested by more downstream projects.
   
   I'm new to this topic so I'd like to understand the situation here. Are 
there any technical difficulties or concerns around the proposed solutions, or 
do we simply need more bandwidth to make it happen? I'd be happy to be part of 
the technical discussion, or contribute code if possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Consider Using object_store as IO Abstraction [iceberg-rust]

Reply via email to