Re: [PR] Add upsert docs [iceberg-python]

via GitHub Sun, 16 Feb 2025 08:56:30 -0800


kevinjqliu commented on PR #1665:
URL: https://github.com/apache/iceberg-python/pull/1665#issuecomment-2661523189


   > Yes, that is an issue, but we don't respect this for any of the operations 
(append, etc). Doing this would make the operations expensive so we could leave 
this up to the user.
   
   You're right, this is an issue for all the write operations, we dont take 
`identifier_field_ids` into account when writing... I'll raise a separate issue 
to track this. For now, Im ok with leaving this up to the user/external engine. 
When done correctly, the write operations will respect the uniqueness.
   To quote the spec, 
   ```
   uniqueness of rows by this identifier is not guaranteed or required by 
Iceberg and it is the responsibility of processing engines or data providers to 
enforce.
   ```
   
   As a followup, we can add a uniqueness check to `upsert` when 
`identifier_field_ids` is set, similar to checking for duplicates. I see this 
issue as a potential footgun so its better to verify the uniqueness and prevent 
data correctness problems.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add upsert docs [iceberg-python]

Reply via email to