maxdebayser opened a new pull request, #7831: URL: https://github.com/apache/iceberg/pull/7831
@Fokko This commit partly addresses issue https://github.com/apache/iceberg/issues/7256. Unfortunately the pyarrow library is not as flexible as we would like. When passing write_statistics=True to `pyarrow.parquet.write_table` the statistics are written out for each row group in the file, instead of computed globally. In the issue a "metadata_collector" was mentioned which I assume is the parameter of the `pyarrow.parquet.write_metadata` function. The `pyarrow.parquet.write_table` function has no such parameter. The function in this PR intentionally works at the level of individual parquet files instead of the dataset to support scenarios such as writing from Ray where each file of the dataset is written by a different task. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
