Fokko opened a new issue, #1678:
URL: https://github.com/apache/iceberg-python/issues/1678

   ### Feature Request / Improvement
   
   Java and Python have a different approach here. I don't have all the 
historical context, but prior to Iceberg V2 tables, there was no such thing as 
[operations](https://iceberg.apache.org/spec/#snapshots):
   
   
![Image](https://github.com/user-attachments/assets/d0b94d2d-d472-4bb2-9fac-9634bbc859d7)
   
   I think this is a good thing to validate against.
   
   This should happen in the 
[`_commit`](https://github.com/apache/iceberg-python/blob/e927aee874cdd3c60fc11425770767cececa4606/pyiceberg/table/update/snapshot.py#L242)
 method of the `_SnapshotProducer`. Similar to Java:
   
   - We should track what the current snapshot was when the table was loaded 
initially 
([startingSnapshotId](https://github.com/apache/iceberg/blob/bcbbd0344623ffea5b092e2de5debb0bc12892a1/core/src/main/java/org/apache/iceberg/BaseReplacePartitions.java#L30
 in Java).
   - We refresh the table, so we have the latest snapshots. We check from the 
`startingSnapshotId` to the `current-snapshot-id` if any snapshots were added. 
If this is the case, we want to `_validate()` if there are any conflicts.
   - Then we write out the manifest-list 
   
   There's also a small section on [conflict 
resolution](https://iceberg.apache.org/spec/#commit-conflict-resolution-and-retry).
   
   ```
   - When doing an `Append`: Adding new data
     - All okay: `{Append,Replace,Overwrite,Delete}`, don't affect the 
operation, and we can just append
   - When doing a `Replace`:  Replacing existing data (eg. compaction)
     - Ok: Append
     - Not ok: Replace, Overwrite, Delete. We should fail, and later we can see 
if there is any overlap (eg compare if they touch the same partitions).
   - When doing a `Overwrite`: Adding and deleting data
     - Not ok: Append, Replace, Overwrite, Delete. We should fail, and later we 
can see if there is any overlap (eg compare if they touch the same partitions).
   - When doing a `Delete`
     - Not ok: Append, Replace, Overwrite, Delete. We should fail, and later we 
can see if there is any overlap (eg compare if they touch the same 
partitions/predicate). We should also take into account the difference between 
MoR and CoW.
   ``` 
   
   Let's only do the very simple cases at first, so we can add ones one by one 
to keep the PR within reasonable size.
   
   Once we have this in place, we can also do automatic retries: 
https://github.com/apache/iceberg-python/issues/269
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to