Re: [PR] feat: Add existing parquet files [iceberg-rust]

via GitHub Mon, 10 Feb 2025 22:25:28 -0800


ZENOTME commented on PR #960:
URL: https://github.com/apache/iceberg-rust/pull/960#issuecomment-2649902294


   >  while the ArrowFileReader provides the parsed metadata which doesn't 
contain enough information
   
   Thanks for your investigation @jonathanc-n! But seems the parsed metadata 
contain enough information (such as 
[statistics](https://docs.rs/parquet/latest/parquet/file/metadata/struct.ColumnChunkMetaData.html#method.statistics))
   
   > For metadata retrieval, it seems i can use 
[ParquetWriter::to_data_file_builder](https://github.com/apache/iceberg-rust/blob/main/crates/iceberg/src/writer/file_writer/parquet_writer.rs#L310C8)
 to avoid duplicating more code.
   > I was planning on submitting a pr in arrow-rs to allow 
ParquetMetadataReader to return the tfilemetadata format. 
   
   I think the parsed format(memory representation) maybe more friendly for us 
to extract information. (Actually I will [convert the thrift format to parsed 
format](https://github.com/ZENOTME/iceberg-rust/blob/cde35ab0eefffae88c521d4e897ba86ee754861c/crates/iceberg/src/writer/file_writer/parquet_writer.rs#L355)).
 So I think we should refine the to_data_file_builder to take the parsed 
format.🤔


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: Add existing parquet files [iceberg-rust]

Reply via email to