sungwy commented on issue #1100: URL: https://github.com/apache/iceberg-python/issues/1100#issuecomment-2312725234
Hi @cfrancois7 - thank you very much for raising this issue! And thank you @ndrluis for jumping on to dig into the root cause as well. We've made some enhancements to PyIceberg to be able to support defining PartitionSpec on table creation (this wasn't even possible before), but there's still two problems here that you helped outline: 1. The supported input arguments in `create_table` API still gives the impression that it supports what you were trying to do 2. Our documentation isn't up to date with the best practices The root cause of the problem is that the IDs of the Iceberg Table schema are reassigned when a table is created. So the constraint the API has on trying to match the PartitionSpec by ID doesn't really work on table creation. Instead, the newly introduced practice is to do the following: ``` with catalog.create_table_transaction( identifier='my_namespace.time_series', schema=ts_schema, ) as txn: with txn.update_spec() as update_spec: update_spec.add_identity("campaign_id") table = catalog.load_table('my_namespace.time_series') ``` This approach relies on just matching the partition field by its field name, similar to how Spark and Flink APIs handle partition updates. Please let me know if this works for you! I think it'll also be worthwhile for us to leave this issue open until we can clarify our API and our documentation to prevent other users from running into the same issues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org