CTTY commented on code in PR #1885:
URL: https://github.com/apache/iceberg-rust/pull/1885#discussion_r2558087949


##########
docs/rfcs/0002_storage_trait.md:
##########
@@ -0,0 +1,353 @@
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+-->
+
+# Making Storage a Trait in Iceberg-rust
+
+## Background
+
+### Existing Implementation
+The existing code implements storage functionality through a concrete Storage 
enum that handles different storage backends (S3, local filesystem, GCS, etc.). 
This implementation is tightly coupled with OpenDAL as the underlying storage 
layer. The FileIO struct wraps this Storage enum and provides a high-level API 
for file operations.
+
+Original structure:
+
+- **FileIO:** Main interface for file operations
+- **Storage:** Enum with variants for different storage backends
+- **InputFile / OutputFile:** Concrete structs for reading and writing files
+
+All storage operations are implemented directly in these concrete types, 
making it hard to extend or customize the storage layer without modifying the 
core codebase.
+
+### Problem Statement
+The original design has several limitations:
+
+- **Tight Coupling** – All storage logic depends on OpenDAL, limiting 
flexibility. Users cannot easily opt in for other storage implementations like 
`object_store`
+- **Customization Barriers** – Users cannot easily add custom behaviors or 
optimizations
+
+As discussed in Issue #1314, making Storage a trait would allow pluggable 
storage and better integration with existing systems.
+
+## Design
+
+### New Architecture
+
+The new architecture uses trait-based abstractions with a registry pattern and 
separate storage crates:
+
+```
+┌─────────────────────────────────────────────────────┐
+│              crates/iceberg/src/io/                 │
+│  ┌───────────────────────────────────────────────┐  │
+│  │         Storage Trait & Registry              │  │
+│  │  - pub trait Storage                          │  │
+│  │  - pub trait StorageBuilder                   │  │
+│  │  - pub struct StorageBuilderRegistry          │  │
+│  │  - pub struct InputFile                       │  │
+│  │  - pub struct OutputFile                      │  │
+│  └───────────────────────────────────────────────┘  │
+└─────────────────────────────────────────────────────┘
+                        ▲
+                        │
+        ┌───────────────┼───────────────┬───────────────┐
+        │               │               │               │
+┌───────┴────────────┐  │  ┌────────────┴──────┐  ┌────┴──────────┐
+│ crates/storage/    │  │  │ crates/storage/   │  │  Third-Party  │
+│    opendal/        │  │  │  object_store/    │  │    Crates     │
+│ ┌────────────────┐ │  │  │ ┌───────────────┐ │  │ ┌───────────┐ │
+│ │ opendal-s3     │ │  │  │ │ objstore-s3   │ │  │ │  custom   │ │
+│ │ impl Storage   │ │  │  │ │ impl Storage  │ │  │ │  storage  │ │
+│ │ impl Builder   │ │  │  │ │ impl Builder  │ │  │ │impl traits│ │
+│ └────────────────┘ │  │  │ └───────────────┘ │  │ └───────────┘ │
+│ ┌────────────────┐ │  │  │ ┌───────────────┐ │  └───────────────┘
+│ │ opendal-fs     │ │  │  │ │ objstore-gcs  │ │
+│ │ impl Storage   │ │  │  │ │ impl Storage  │ │
+│ │ impl Builder   │ │  │  │ │ impl Builder  │ │
+│ └────────────────┘ │  │  │ └───────────────┘ │
+│ ┌────────────────┐ │  │  │ ┌───────────────┐ │
+│ │ opendal-gcs    │ │  │  │ │ objstore-azure│ │
+│ │ impl Storage   │ │  │  │ │ impl Storage  │ │
+│ │ impl Builder   │ │  │  │ │ impl Builder  │ │
+│ └────────────────┘ │  │  │ └───────────────┘ │
+│ ... (oss, azure,   │  │  └───────────────────┘
+│      memory)       │  │
+└────────────────────┘  │
+```
+
+### Storage Trait
+The Storage trait defines the interface for all storage operations. This 
implementation uses Option 2 from the initial design: Storage as a trait with 
concrete `InputFile` and `OutputFile` structs.
+
+```rust
+#[async_trait]
+pub trait Storage: Debug + Send + Sync {
+    // File existence and metadata
+    async fn exists(&self, path: &str) -> Result<bool>;
+    async fn metadata(&self, path: &str) -> Result<FileMetadata>;
+
+    // Reading operations
+    async fn read(&self, path: &str) -> Result<Bytes>;
+    async fn reader(&self, path: &str) -> Result<Box<dyn FileRead>>;
+
+    // Writing operations
+    async fn write(&self, path: &str, bs: Bytes) -> Result<()>;
+    async fn writer(&self, path: &str) -> Result<Box<dyn FileWrite>>;
+
+    // Deletion operations
+    async fn delete(&self, path: &str) -> Result<()>;
+    async fn remove_dir_all(&self, path: &str) -> Result<()>;

Review Comment:
   Migrating comments from @Fokko 
   > This name feels very much file-system like, while Iceberg is designed to 
work against object stores. How about delete_prefix?



##########
docs/rfcs/0002_storage_trait.md:
##########
@@ -0,0 +1,353 @@
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+-->
+
+# Making Storage a Trait in Iceberg-rust
+
+## Background
+
+### Existing Implementation
+The existing code implements storage functionality through a concrete Storage 
enum that handles different storage backends (S3, local filesystem, GCS, etc.). 
This implementation is tightly coupled with OpenDAL as the underlying storage 
layer. The FileIO struct wraps this Storage enum and provides a high-level API 
for file operations.
+
+Original structure:
+
+- **FileIO:** Main interface for file operations
+- **Storage:** Enum with variants for different storage backends
+- **InputFile / OutputFile:** Concrete structs for reading and writing files
+
+All storage operations are implemented directly in these concrete types, 
making it hard to extend or customize the storage layer without modifying the 
core codebase.
+
+### Problem Statement
+The original design has several limitations:
+
+- **Tight Coupling** – All storage logic depends on OpenDAL, limiting 
flexibility. Users cannot easily opt in for other storage implementations like 
`object_store`
+- **Customization Barriers** – Users cannot easily add custom behaviors or 
optimizations
+
+As discussed in Issue #1314, making Storage a trait would allow pluggable 
storage and better integration with existing systems.
+
+## Design
+
+### New Architecture
+
+The new architecture uses trait-based abstractions with a registry pattern and 
separate storage crates:
+
+```
+┌─────────────────────────────────────────────────────┐
+│              crates/iceberg/src/io/                 │
+│  ┌───────────────────────────────────────────────┐  │
+│  │         Storage Trait & Registry              │  │
+│  │  - pub trait Storage                          │  │
+│  │  - pub trait StorageBuilder                   │  │
+│  │  - pub struct StorageBuilderRegistry          │  │
+│  │  - pub struct InputFile                       │  │
+│  │  - pub struct OutputFile                      │  │
+│  └───────────────────────────────────────────────┘  │
+└─────────────────────────────────────────────────────┘
+                        ▲
+                        │
+        ┌───────────────┼───────────────┬───────────────┐
+        │               │               │               │
+┌───────┴────────────┐  │  ┌────────────┴──────┐  ┌────┴──────────┐
+│ crates/storage/    │  │  │ crates/storage/   │  │  Third-Party  │
+│    opendal/        │  │  │  object_store/    │  │    Crates     │
+│ ┌────────────────┐ │  │  │ ┌───────────────┐ │  │ ┌───────────┐ │
+│ │ opendal-s3     │ │  │  │ │ objstore-s3   │ │  │ │  custom   │ │
+│ │ impl Storage   │ │  │  │ │ impl Storage  │ │  │ │  storage  │ │
+│ │ impl Builder   │ │  │  │ │ impl Builder  │ │  │ │impl traits│ │
+│ └────────────────┘ │  │  │ └───────────────┘ │  │ └───────────┘ │
+│ ┌────────────────┐ │  │  │ ┌───────────────┐ │  └───────────────┘
+│ │ opendal-fs     │ │  │  │ │ objstore-gcs  │ │
+│ │ impl Storage   │ │  │  │ │ impl Storage  │ │
+│ │ impl Builder   │ │  │  │ │ impl Builder  │ │
+│ └────────────────┘ │  │  │ └───────────────┘ │
+│ ┌────────────────┐ │  │  │ ┌───────────────┐ │
+│ │ opendal-gcs    │ │  │  │ │ objstore-azure│ │
+│ │ impl Storage   │ │  │  │ │ impl Storage  │ │
+│ │ impl Builder   │ │  │  │ │ impl Builder  │ │
+│ └────────────────┘ │  │  │ └───────────────┘ │
+│ ... (oss, azure,   │  │  └───────────────────┘
+│      memory)       │  │
+└────────────────────┘  │
+```
+
+### Storage Trait
+The Storage trait defines the interface for all storage operations. This 
implementation uses Option 2 from the initial design: Storage as a trait with 
concrete `InputFile` and `OutputFile` structs.
+
+```rust
+#[async_trait]
+pub trait Storage: Debug + Send + Sync {
+    // File existence and metadata
+    async fn exists(&self, path: &str) -> Result<bool>;
+    async fn metadata(&self, path: &str) -> Result<FileMetadata>;
+
+    // Reading operations
+    async fn read(&self, path: &str) -> Result<Bytes>;
+    async fn reader(&self, path: &str) -> Result<Box<dyn FileRead>>;
+
+    // Writing operations
+    async fn write(&self, path: &str, bs: Bytes) -> Result<()>;
+    async fn writer(&self, path: &str) -> Result<Box<dyn FileWrite>>;
+
+    // Deletion operations
+    async fn delete(&self, path: &str) -> Result<()>;
+    async fn remove_dir_all(&self, path: &str) -> Result<()>;
+
+    // File object creation
+    fn new_input(&self, path: &str) -> Result<InputFile>;
+    fn new_output(&self, path: &str) -> Result<OutputFile>;
+}

Review Comment:
   Migrating comments from @c-thiel 
   > Many object stores have a good way of running batch deletions, for example 
the DeleteObjects API in AWS S3. How would you feel about including a 
delete_batch method too?



##########
docs/rfcs/0002_storage_trait.md:
##########
@@ -0,0 +1,353 @@
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+-->
+
+# Making Storage a Trait in Iceberg-rust
+
+## Background
+
+### Existing Implementation
+The existing code implements storage functionality through a concrete Storage 
enum that handles different storage backends (S3, local filesystem, GCS, etc.). 
This implementation is tightly coupled with OpenDAL as the underlying storage 
layer. The FileIO struct wraps this Storage enum and provides a high-level API 
for file operations.
+
+Original structure:
+
+- **FileIO:** Main interface for file operations
+- **Storage:** Enum with variants for different storage backends
+- **InputFile / OutputFile:** Concrete structs for reading and writing files
+
+All storage operations are implemented directly in these concrete types, 
making it hard to extend or customize the storage layer without modifying the 
core codebase.
+
+### Problem Statement
+The original design has several limitations:
+
+- **Tight Coupling** – All storage logic depends on OpenDAL, limiting 
flexibility. Users cannot easily opt in for other storage implementations like 
`object_store`
+- **Customization Barriers** – Users cannot easily add custom behaviors or 
optimizations
+
+As discussed in Issue #1314, making Storage a trait would allow pluggable 
storage and better integration with existing systems.
+
+## Design
+
+### New Architecture
+
+The new architecture uses trait-based abstractions with a registry pattern and 
separate storage crates:
+
+```
+┌─────────────────────────────────────────────────────┐
+│              crates/iceberg/src/io/                 │
+│  ┌───────────────────────────────────────────────┐  │
+│  │         Storage Trait & Registry              │  │
+│  │  - pub trait Storage                          │  │
+│  │  - pub trait StorageBuilder                   │  │
+│  │  - pub struct StorageBuilderRegistry          │  │
+│  │  - pub struct InputFile                       │  │
+│  │  - pub struct OutputFile                      │  │
+│  └───────────────────────────────────────────────┘  │
+└─────────────────────────────────────────────────────┘
+                        ▲
+                        │
+        ┌───────────────┼───────────────┬───────────────┐
+        │               │               │               │
+┌───────┴────────────┐  │  ┌────────────┴──────┐  ┌────┴──────────┐
+│ crates/storage/    │  │  │ crates/storage/   │  │  Third-Party  │
+│    opendal/        │  │  │  object_store/    │  │    Crates     │
+│ ┌────────────────┐ │  │  │ ┌───────────────┐ │  │ ┌───────────┐ │
+│ │ opendal-s3     │ │  │  │ │ objstore-s3   │ │  │ │  custom   │ │
+│ │ impl Storage   │ │  │  │ │ impl Storage  │ │  │ │  storage  │ │
+│ │ impl Builder   │ │  │  │ │ impl Builder  │ │  │ │impl traits│ │
+│ └────────────────┘ │  │  │ └───────────────┘ │  │ └───────────┘ │
+│ ┌────────────────┐ │  │  │ ┌───────────────┐ │  └───────────────┘
+│ │ opendal-fs     │ │  │  │ │ objstore-gcs  │ │
+│ │ impl Storage   │ │  │  │ │ impl Storage  │ │
+│ │ impl Builder   │ │  │  │ │ impl Builder  │ │
+│ └────────────────┘ │  │  │ └───────────────┘ │
+│ ┌────────────────┐ │  │  │ ┌───────────────┐ │
+│ │ opendal-gcs    │ │  │  │ │ objstore-azure│ │
+│ │ impl Storage   │ │  │  │ │ impl Storage  │ │
+│ │ impl Builder   │ │  │  │ │ impl Builder  │ │
+│ └────────────────┘ │  │  │ └───────────────┘ │
+│ ... (oss, azure,   │  │  └───────────────────┘
+│      memory)       │  │
+└────────────────────┘  │
+```
+
+### Storage Trait
+The Storage trait defines the interface for all storage operations. This 
implementation uses Option 2 from the initial design: Storage as a trait with 
concrete `InputFile` and `OutputFile` structs.
+
+```rust
+#[async_trait]
+pub trait Storage: Debug + Send + Sync {
+    // File existence and metadata
+    async fn exists(&self, path: &str) -> Result<bool>;

Review Comment:
   Migrating comments from @c-thiel 
   >I know that we use these results everywhere, but I think introducing more 
specific error types that we can match on for storage operations makes sense. 
They can implement Into<Error> of course.
   > For example, a RateLimited error that we got from the storage service 
should be treated differently from NotFound or CredentialsExpired.
   > With Lakekeeper we are currently using our own trait based IO due to many 
limitations in iceberg-rust, mainly due to unsupported signing mechanisms, 
missing refresh mechanisms, intransparent errors and missing extendability.
   > I would gladly switch to iceberg-rust if we get these solved.
   > Maybe this can serve as some inspiration: 
[https://github.com/lakekeeper/lakekeeper/blob/b8fcf54c627d48a547ef0baf6863949b68579388/crates/io/src/error.rs#L291](https://www.google.com/url?q=https://github.com/lakekeeper/lakekeeper/blob/b8fcf54c627d48a547ef0baf6863949b68579388/crates/io/src/error.rs%23L291&sa=D&source=docs&ust=1764028563749361&usg=AOvVaw3MW1dXJcdsIf2SiCoYo3yz)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to