Re: [I] I do not understand the partition error: ValueError: Could not find in old schema: 2: {field}: identity(2) [iceberg-python]

via GitHub Tue, 27 Aug 2024 07:26:48 -0700


sungwy commented on issue #1100:
URL: 
https://github.com/apache/iceberg-python/issues/1100#issuecomment-2312725234


   Hi @cfrancois7 - thank you very much for raising this issue! And thank you 
@ndrluis for jumping on to dig into the root cause as well.
   
   We've made some enhancements to PyIceberg to be able to support defining 
PartitionSpec on table creation (this wasn't even possible before), but there's 
still two problems here that you helped outline:
   1. The supported input arguments in `create_table` API still gives the 
impression that it supports what you were trying to do
   2. Our documentation isn't up to date with the best practices
   
   The root cause of the problem is that the IDs of the Iceberg Table schema 
are reassigned when a table is created. So the constraint the API has on trying 
to match the PartitionSpec by ID doesn't really work on table creation.
   
   Instead, the newly introduced practice is to do the following:
   
   ```
   with catalog.create_table_transaction(
       identifier='my_namespace.time_series',
       schema=ts_schema,
   ) as txn:
       with txn.update_spec() as update_spec:
           update_spec.add_identity("campaign_id")
   
   table = catalog.load_table('my_namespace.time_series')
   ```
   
   This approach relies on just matching the partition field by its field name, 
similar to how Spark and Flink APIs handle partition updates.
   
   Please let me know if this works for you! I think it'll also be worthwhile 
for us to leave this issue open until we can clarify our API and our 
documentation to prevent other users from running into the same issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] I do not understand the partition error: ValueError: Could not find in old schema: 2: {field}: identity(2) [iceberg-python]

Reply via email to