Re: [I] Make FileIO a Trait [iceberg-rust]

via GitHub Fri, 23 May 2025 03:19:06 -0700


roeap commented on issue #1314:
URL: https://github.com/apache/iceberg-rust/issues/1314#issuecomment-2903969557


   Just sharing some experiences from the delta world which may not immediately 
applicable to the question around which trait to use, but maybe be food for 
thought as to where things could be heading?
   
   One thing that repeatedly comes up when talking about table formats is 
"Metadata is Data". The to me logical consequence of that is to treat it as 
such, meaning process it with the same tools that you would use processing 
data. To that avail delta-rs currently keeps all metadata around as arrow 
record batches, and delta-kernel goes even further abstracting away the 
specific data representation.
   
   As such the higher level abstractions we chose are on the level of file 
formats. I.e. read this {parquet,json,avro,..} file into arrow with this 
schema. The internal logic processing the metadata either visits individual 
fields or applies expressions on the data to generate the plans for scans etc. 
I think to a certain degree this thinking is actually baked into the Iceberg 
spec via the metadata tables.
   
   By default we provide an arrow (arrays and kernels) and object_store based 
implementation using many of the same tools used here to read data. Currently I 
am working on a datafusion engine for kernel, where datafusions execution plans 
are used to read data and datafusions' native expression for evaluation.
   
   As a consequence virtually all resource management is under full control of 
the query engine which is also free to apply any more advanced optimisations 
(caching, etc.) as it sees fit.
   
   All that said, I am about to start a PoC to find out how much of the query 
planning and eventually also maintenance that is implemented in aforementioned 
datafusion engine can be applied to both delta and iceberg.
   
   One thing I am fairly certain of is that the work discussed here will be 
making my life much easier, and if we end up in a place where we can at least 
do something like ...
   
   ```rust
   impl<T: ObjectStore> FileIo for T {
       ...
   }
   ```
   
   that would be awesome!
   
   Once we have a consensus here, I am happy to offer my support driving this 
forward!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Make FileIO a Trait [iceberg-rust]

Reply via email to