mxm commented on PR #14092:
URL: https://github.com/apache/iceberg/pull/14092#issuecomment-3326991871

   @aiborodin Regular data files and delete files can both be part of the same 
snapshot (actually, have to be for upsert semantics). However, we have to 
create a table snapshot before we process WriteResults with delete files. The 
reason is that data and delete files are not ordered, but deletes often require 
an order to be applied correctly. 
   
   For example: WriteResult `w1` (append-only) and WriteResult `w2` (delete + 
append).
   
   If we would combine `w1` and `w2` into a single snapshot, Iceberg will first 
apply the delete files and delete the relevant rows, then apply the appends 
from both WriteResults. If we merged both WriteResults and created a single 
table snapshot, any deletes matching rows appended in `w1` would not get 
deleted. Deletes are always applied before appends. There is no order between 
data files and delete files, which honestly feels like a limitation of Iceberg.
   
   The semantics are totally different when we first create a table snapshot 
for `w1`, because any deletes by `w2` would be applied on top of this snapshot 
and before appending data via `w2`. That's the reason why I think we can 
combine WriteResults into a single snapshot, as long as we don't see delete 
files. As soon as we discover a delete file, we need to create a table snapshot 
with the so-far aggregated WriteResults.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to