liurenjie1024 commented on issue #172:
URL: https://github.com/apache/iceberg-rust/issues/172#issuecomment-2483125592

   > > point about the existing Datafusion machinery
   > 
   > DataFusion provides an 
[ObjectStoreRegistry](https://docs.rs/datafusion/latest/datafusion/datasource/object_store/trait.ObjectStoreRegistry.html)
 as part of the 
[SessionContext](https://docs.rs/datafusion/latest/datafusion/execution/context/struct.SessionContext.html).
 This is then what various abstractions like 
[ParquetExec](https://docs.rs/datafusion/latest/datafusion/datasource/physical_plan/parquet/struct.ParquetExec.html)
 hook into.
   > 
   > By integrating with this iceberg-rs would better interoperate with the 
rest of the DataFusion ecosystem, be they other catalogs like listing table, 
deltalake, Hive, etc... or unusual deployment scenarios with custom caching 
object stores, etc... It seems unfortunate for users to need to configure 
iceberg-rs separately from the rest of DataFusion. It would also benefit from 
the ongoing work to improve those components and systems.
   > 
   > I don't know to what extent the desire is to make iceberg-rs a standalone 
library that mirrors the Java APIs and configuration, but I thought it 
worthwhile to at least make the case for closer integration with DataFusion. It 
seems like quite a lot of undifferentiated toil to rebuild the quite subtle 
logic around predicate pushdown, concurrent decode, etc...
   > 
   > Edit: to ground this a bit more, the advantage of a trait based approach, 
is the DF bindings could provide a component wrapping SessionContext or 
similar, without forcing iceberg to take a dependency on DF or maintain this 
mapping
   
   If we want to allow integrating with `ObjectStoreRegistry`, we would need 
one more trait like `StorageProvider`:
   ```rust
   #[async_trait]
   pub trait StorageProvider {
     async fn build(&self, configs: &HashMap<String, String>, url: &str) -> 
Arc<dyn Stroage>;
   }
   ```
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to