ZENOTME commented on PR #960: URL: https://github.com/apache/iceberg-rust/pull/960#issuecomment-2653614240
> > PyIceberg and Iceberg-Java are a bit different. Where PyIceberg is used by end-users, Iceberg-Java is often embedded in a query engine. > > I think iceberg-rust's position is more like iceberg-java, since there already existing sql engines written in rust(datafusion, databend, polars, etc), and integrating them with iceberg-rust makes things easier. I think it's a good idea to have methods tto convert a parquet file to a data file, including works mentioned by @Fokko . But it's better to handle other things to query engines, since it involves io, parallelism management. So in summary, for this feature, what we would like to provide is a `parquet_files_to_data_files` in the arrow module(or create a parquet module). The `parquet_files_to_data_files` actually does two things: 1. schema compatibility check 2. metrics collection (we can derive a function `data_file_statistics_from_parquet_metadata` for this which can be reused in parquet file writer) How do you think @liurenjie1024 @jonathanc-n @Fokko -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org