Re: [PR] feat: Add existing parquet files [iceberg-rust]

via GitHub Wed, 12 Feb 2025 04:46:35 -0800


ZENOTME commented on PR #960:
URL: https://github.com/apache/iceberg-rust/pull/960#issuecomment-2653614240


   > > PyIceberg and Iceberg-Java are a bit different. Where PyIceberg is used 
by end-users, Iceberg-Java is often embedded in a query engine.
   > 
   > I think iceberg-rust's position is more like iceberg-java, since there 
already existing sql engines written in rust(datafusion, databend, polars, 
etc), and integrating them with iceberg-rust makes things easier. I think it's 
a good idea to have methods tto convert a parquet file to a data file, 
including works mentioned by @Fokko . But it's better to handle other things to 
query engines, since it involves io, parallelism management.
   
   So in summary, for this feature, what we would like to provide is a 
`parquet_files_to_data_files` in the arrow module(or create a parquet module). 
The `parquet_files_to_data_files` actually does two things:
   1. schema compatibility check
   2. metrics collection (we can derive a function 
`data_file_statistics_from_parquet_metadata` for this which can be reused in 
parquet file writer)
   
   How do you think @liurenjie1024 @jonathanc-n @Fokko 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: Add existing parquet files [iceberg-rust]

Reply via email to