syun64 opened a new pull request, #531: URL: https://github.com/apache/iceberg-python/pull/531
As a follow up to https://github.com/apache/iceberg-python/pull/506, this PR introduces the support for adding files as DataFiles to partitioned tables. Instead of relying on the more inaccurate method of parsing and inferring partition values from the file path relying on a Hive partitioning scheme, this approach requires that the partition values are there in the parquet files, and infers the partition values from the partition metadata footer by taking using the lower and upper bound values. The optimization to use the lower bound and upper bound values prevents the client from having to read the entire parquet file as it is able to use the aggregated statistics from the parquet metadata footer. As a result, this implementation of add_files does not support tables with partition transforms that are non-linear (not `preserves_order`). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org