syun64 opened a new pull request, #531:
URL: https://github.com/apache/iceberg-python/pull/531

   As a follow up to https://github.com/apache/iceberg-python/pull/506, this PR 
introduces the support for adding files as DataFiles to partitioned tables.
   
   Instead of relying on the more inaccurate method of parsing and inferring 
partition values from the file path relying on a Hive partitioning scheme, this 
approach requires that the partition values are there in the parquet files, and 
infers the partition values from the partition metadata footer by taking using 
the lower and upper bound values.
   
   The optimization to use the lower bound and upper bound values prevents the 
client from having to read the entire parquet file as it is able to use the 
aggregated statistics from the parquet metadata footer. As a result, this 
implementation of add_files does not support tables with partition transforms 
that are non-linear (not `preserves_order`).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to