liurenjie1024 commented on code in PR #1885: URL: https://github.com/apache/iceberg-rust/pull/1885#discussion_r2588452417
########## docs/rfcs/0002_storage_trait.md: ########## @@ -0,0 +1,353 @@ +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ "License"); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. +--> + +# Making Storage a Trait in Iceberg-rust + +## Background + +### Existing Implementation +The existing code implements storage functionality through a concrete Storage enum that handles different storage backends (S3, local filesystem, GCS, etc.). This implementation is tightly coupled with OpenDAL as the underlying storage layer. The FileIO struct wraps this Storage enum and provides a high-level API for file operations. + +Original structure: + +- **FileIO:** Main interface for file operations +- **Storage:** Enum with variants for different storage backends +- **InputFile / OutputFile:** Concrete structs for reading and writing files + +All storage operations are implemented directly in these concrete types, making it hard to extend or customize the storage layer without modifying the core codebase. + +### Problem Statement +The original design has several limitations: + +- **Tight Coupling** – All storage logic depends on OpenDAL, limiting flexibility. Users cannot easily opt in for other storage implementations like `object_store` +- **Customization Barriers** – Users cannot easily add custom behaviors or optimizations + +As discussed in Issue #1314, making Storage a trait would allow pluggable storage and better integration with existing systems. + +## Design + +### New Architecture + +The new architecture uses trait-based abstractions with a registry pattern and separate storage crates: + +``` +┌─────────────────────────────────────────────────────┐ +│ crates/iceberg/src/io/ │ +│ ┌───────────────────────────────────────────────┐ │ +│ │ Storage Trait & Registry │ │ +│ │ - pub trait Storage │ │ +│ │ - pub trait StorageBuilder │ │ +│ │ - pub struct StorageBuilderRegistry │ │ +│ │ - pub struct InputFile │ │ +│ │ - pub struct OutputFile │ │ +│ └───────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────┘ + ▲ + │ + ┌───────────────┼───────────────┬───────────────┐ + │ │ │ │ +┌───────┴────────────┐ │ ┌────────────┴──────┐ ┌────┴──────────┐ +│ crates/storage/ │ │ │ crates/storage/ │ │ Third-Party │ +│ opendal/ │ │ │ object_store/ │ │ Crates │ +│ ┌────────────────┐ │ │ │ ┌───────────────┐ │ │ ┌───────────┐ │ +│ │ opendal-s3 │ │ │ │ │ objstore-s3 │ │ │ │ custom │ │ +│ │ impl Storage │ │ │ │ │ impl Storage │ │ │ │ storage │ │ +│ │ impl Builder │ │ │ │ │ impl Builder │ │ │ │impl traits│ │ +│ └────────────────┘ │ │ │ └───────────────┘ │ │ └───────────┘ │ +│ ┌────────────────┐ │ │ │ ┌───────────────┐ │ └───────────────┘ +│ │ opendal-fs │ │ │ │ │ objstore-gcs │ │ +│ │ impl Storage │ │ │ │ │ impl Storage │ │ +│ │ impl Builder │ │ │ │ │ impl Builder │ │ +│ └────────────────┘ │ │ │ └───────────────┘ │ +│ ┌────────────────┐ │ │ │ ┌───────────────┐ │ +│ │ opendal-gcs │ │ │ │ │ objstore-azure│ │ +│ │ impl Storage │ │ │ │ │ impl Storage │ │ +│ │ impl Builder │ │ │ │ │ impl Builder │ │ +│ └────────────────┘ │ │ │ └───────────────┘ │ +│ ... (oss, azure, │ │ └───────────────────┘ +│ memory) │ │ +└────────────────────┘ │ +``` + +### Storage Trait +The Storage trait defines the interface for all storage operations. This implementation uses Option 2 from the initial design: Storage as a trait with concrete `InputFile` and `OutputFile` structs. + +```rust +#[async_trait] +pub trait Storage: Debug + Send + Sync { + // File existence and metadata + async fn exists(&self, path: &str) -> Result<bool>; Review Comment: After second think, I think we should make `Storage` trait serializable rather `StorageRegistry`. Think about the use case: in a distributed compute engine, the master node uses catalog to load table, and do planning. After that, it will send the scan split to worker node. In this case the scan split should contain `Storage`, since it will contains things like s3 credential, encryption key. ########## docs/rfcs/0002_storage_trait.md: ########## @@ -0,0 +1,353 @@ +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ "License"); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. +--> + +# Making Storage a Trait in Iceberg-rust + +## Background + +### Existing Implementation +The existing code implements storage functionality through a concrete Storage enum that handles different storage backends (S3, local filesystem, GCS, etc.). This implementation is tightly coupled with OpenDAL as the underlying storage layer. The FileIO struct wraps this Storage enum and provides a high-level API for file operations. + +Original structure: + +- **FileIO:** Main interface for file operations +- **Storage:** Enum with variants for different storage backends +- **InputFile / OutputFile:** Concrete structs for reading and writing files + +All storage operations are implemented directly in these concrete types, making it hard to extend or customize the storage layer without modifying the core codebase. + +### Problem Statement +The original design has several limitations: + +- **Tight Coupling** – All storage logic depends on OpenDAL, limiting flexibility. Users cannot easily opt in for other storage implementations like `object_store` +- **Customization Barriers** – Users cannot easily add custom behaviors or optimizations + +As discussed in Issue #1314, making Storage a trait would allow pluggable storage and better integration with existing systems. + +## Design + +### New Architecture + +The new architecture uses trait-based abstractions with a registry pattern and separate storage crates: + +``` +┌─────────────────────────────────────────────────────┐ +│ crates/iceberg/src/io/ │ +│ ┌───────────────────────────────────────────────┐ │ +│ │ Storage Trait & Registry │ │ +│ │ - pub trait Storage │ │ +│ │ - pub trait StorageBuilder │ │ +│ │ - pub struct StorageBuilderRegistry │ │ +│ │ - pub struct InputFile │ │ +│ │ - pub struct OutputFile │ │ +│ └───────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────┘ + ▲ + │ + ┌───────────────┼───────────────┬───────────────┐ + │ │ │ │ +┌───────┴────────────┐ │ ┌────────────┴──────┐ ┌────┴──────────┐ +│ crates/storage/ │ │ │ crates/storage/ │ │ Third-Party │ +│ opendal/ │ │ │ object_store/ │ │ Crates │ +│ ┌────────────────┐ │ │ │ ┌───────────────┐ │ │ ┌───────────┐ │ +│ │ opendal-s3 │ │ │ │ │ objstore-s3 │ │ │ │ custom │ │ +│ │ impl Storage │ │ │ │ │ impl Storage │ │ │ │ storage │ │ +│ │ impl Builder │ │ │ │ │ impl Builder │ │ │ │impl traits│ │ +│ └────────────────┘ │ │ │ └───────────────┘ │ │ └───────────┘ │ +│ ┌────────────────┐ │ │ │ ┌───────────────┐ │ └───────────────┘ +│ │ opendal-fs │ │ │ │ │ objstore-gcs │ │ +│ │ impl Storage │ │ │ │ │ impl Storage │ │ +│ │ impl Builder │ │ │ │ │ impl Builder │ │ +│ └────────────────┘ │ │ │ └───────────────┘ │ +│ ┌────────────────┐ │ │ │ ┌───────────────┐ │ +│ │ opendal-gcs │ │ │ │ │ objstore-azure│ │ +│ │ impl Storage │ │ │ │ │ impl Storage │ │ +│ │ impl Builder │ │ │ │ │ impl Builder │ │ +│ └────────────────┘ │ │ │ └───────────────┘ │ +│ ... (oss, azure, │ │ └───────────────────┘ +│ memory) │ │ +└────────────────────┘ │ +``` Review Comment: > My current feeling is that we don't need to add impls for every service, instead we add impls for every provider. One provider can provide multiple impls, and that can simplify a lot of our work. We had a discussion before, but I don't think this is a good abstraction. The `Stroage` trait is similar to `FileIO` interface in java, which as its name says, provides an abstraction layer on one kind of storage. I think as a library, the simplicity of implemantation should not be our top priority(although I don't think it would be much more harder), but the correct abstraction. There are several benefits with this approach: 1. Following same pattern as java. This will make it easier for whole iceberg community to maintain it. Our currently `Stroage` trait's functionality is a little limited compared with java, but I think it's fine for the first version. We will need to deal with more complex things like credential, encryption, etc. 2. Remember that the goal of this refactoring is to make things extensible, e.g. we should even allow user to provide their own stroage implementation. Mixing abstraction layer would make things difficult. ########## docs/rfcs/0002_storage_trait.md: ########## @@ -0,0 +1,353 @@ +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ "License"); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. +--> + +# Making Storage a Trait in Iceberg-rust + +## Background + +### Existing Implementation +The existing code implements storage functionality through a concrete Storage enum that handles different storage backends (S3, local filesystem, GCS, etc.). This implementation is tightly coupled with OpenDAL as the underlying storage layer. The FileIO struct wraps this Storage enum and provides a high-level API for file operations. + +Original structure: + +- **FileIO:** Main interface for file operations +- **Storage:** Enum with variants for different storage backends +- **InputFile / OutputFile:** Concrete structs for reading and writing files + +All storage operations are implemented directly in these concrete types, making it hard to extend or customize the storage layer without modifying the core codebase. + +### Problem Statement +The original design has several limitations: + +- **Tight Coupling** – All storage logic depends on OpenDAL, limiting flexibility. Users cannot easily opt in for other storage implementations like `object_store` +- **Customization Barriers** – Users cannot easily add custom behaviors or optimizations + +As discussed in Issue #1314, making Storage a trait would allow pluggable storage and better integration with existing systems. + +## Design + +### New Architecture + +The new architecture uses trait-based abstractions with a registry pattern and separate storage crates: + +``` +┌─────────────────────────────────────────────────────┐ +│ crates/iceberg/src/io/ │ +│ ┌───────────────────────────────────────────────┐ │ +│ │ Storage Trait & Registry │ │ +│ │ - pub trait Storage │ │ +│ │ - pub trait StorageBuilder │ │ +│ │ - pub struct StorageBuilderRegistry │ │ +│ │ - pub struct InputFile │ │ +│ │ - pub struct OutputFile │ │ +│ └───────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────┘ + ▲ + │ + ┌───────────────┼───────────────┬───────────────┐ + │ │ │ │ +┌───────┴────────────┐ │ ┌────────────┴──────┐ ┌────┴──────────┐ +│ crates/storage/ │ │ │ crates/storage/ │ │ Third-Party │ +│ opendal/ │ │ │ object_store/ │ │ Crates │ +│ ┌────────────────┐ │ │ │ ┌───────────────┐ │ │ ┌───────────┐ │ +│ │ opendal-s3 │ │ │ │ │ objstore-s3 │ │ │ │ custom │ │ +│ │ impl Storage │ │ │ │ │ impl Storage │ │ │ │ storage │ │ +│ │ impl Builder │ │ │ │ │ impl Builder │ │ │ │impl traits│ │ +│ └────────────────┘ │ │ │ └───────────────┘ │ │ └───────────┘ │ +│ ┌────────────────┐ │ │ │ ┌───────────────┐ │ └───────────────┘ +│ │ opendal-fs │ │ │ │ │ objstore-gcs │ │ +│ │ impl Storage │ │ │ │ │ impl Storage │ │ +│ │ impl Builder │ │ │ │ │ impl Builder │ │ +│ └────────────────┘ │ │ │ └───────────────┘ │ +│ ┌────────────────┐ │ │ │ ┌───────────────┐ │ +│ │ opendal-gcs │ │ │ │ │ objstore-azure│ │ +│ │ impl Storage │ │ │ │ │ impl Storage │ │ +│ │ impl Builder │ │ │ │ │ impl Builder │ │ +│ └────────────────┘ │ │ │ └───────────────┘ │ +│ ... (oss, azure, │ │ └───────────────────┘ +│ memory) │ │ +└────────────────────┘ │ +``` Review Comment: > I'm feeling that having a "default" Storage will be a must, just like we are having MemoryCatalog in the iceberg crate, Do you mean to include sth like `MemoryStorage`/`FsStroage`? If so, I think it would be useful for ut. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
