ZENOTME commented on PR #960: URL: https://github.com/apache/iceberg-rust/pull/960#issuecomment-2649902294
> while the ArrowFileReader provides the parsed metadata which doesn't contain enough information Thanks for your investigation @jonathanc-n! But seems the parsed metadata contain enough information (such as [statistics](https://docs.rs/parquet/latest/parquet/file/metadata/struct.ColumnChunkMetaData.html#method.statistics)) > For metadata retrieval, it seems i can use [ParquetWriter::to_data_file_builder](https://github.com/apache/iceberg-rust/blob/main/crates/iceberg/src/writer/file_writer/parquet_writer.rs#L310C8) to avoid duplicating more code. > I was planning on submitting a pr in arrow-rs to allow ParquetMetadataReader to return the tfilemetadata format. I think the parsed format(memory representation) maybe more friendly for us to extract information. (Actually I will [convert the thrift format to parsed format](https://github.com/ZENOTME/iceberg-rust/blob/cde35ab0eefffae88c521d4e897ba86ee754861c/crates/iceberg/src/writer/file_writer/parquet_writer.rs#L355)). So I think we should refine the to_data_file_builder to take the parsed format.🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org