mxm commented on PR #14092:
URL: https://github.com/apache/iceberg/pull/14092#issuecomment-3337424532

   > @mxm Just combining the append-only WriteResults in each Flink checkpoint 
would not resolve the issue for WriteResults with delete files (see my comment 
in [#14182 
(comment)](https://github.com/apache/iceberg/pull/14182#issuecomment-3336891582)).
 We should combine both appends and deletes within the same checkpoint 
(implemented in this change), which is valid for equality delete semantics 
because records with the same equality fields would always be routed to the 
same writer, storing all unique equality delete keys in the same delete file.
   
   I've responded in 
https://github.com/apache/iceberg/pull/14182#issuecomment-3337290281. In a 
nutshell, my code was doing exactly what you suggest here, committing all 
appends / deletes from a checkpoint, but it would additionally add WriteResults 
from other checkpoint if those did not contain delete files. I noticed that the 
code was too hard to understand, so I removed this optimization.
   
   > My suggestion is to combine both appends and deletes for the same 
checkpoint, which I implemented in this change.
   
   IMHO the solution you implemented goes beyond the scope of fixing the issue. 
I'm open to refactoring the aggregator / committer code, but I think we should 
add a set of minimal changes to fix the correctness issue, along side with 
increased test coverage. This is what I think #14182 achieves.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to