kevinjqliu commented on PR #1665: URL: https://github.com/apache/iceberg-python/pull/1665#issuecomment-2661523189
> Yes, that is an issue, but we don't respect this for any of the operations (append, etc). Doing this would make the operations expensive so we could leave this up to the user. You're right, this is an issue for all the write operations, we dont take `identifier_field_ids` into account when writing... I'll raise a separate issue to track this. For now, Im ok with leaving this up to the user/external engine. When done correctly, the write operations will respect the uniqueness. To quote the spec, ``` uniqueness of rows by this identifier is not guaranteed or required by Iceberg and it is the responsibility of processing engines or data providers to enforce. ``` As a followup, we can add a uniqueness check to `upsert` when `identifier_field_ids` is set, similar to checking for duplicates. I see this issue as a potential footgun so its better to verify the uniqueness and prevent data correctness problems. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org