mxm commented on PR #14092: URL: https://github.com/apache/iceberg/pull/14092#issuecomment-3337424532
> @mxm Just combining the append-only WriteResults in each Flink checkpoint would not resolve the issue for WriteResults with delete files (see my comment in [#14182 (comment)](https://github.com/apache/iceberg/pull/14182#issuecomment-3336891582)). We should combine both appends and deletes within the same checkpoint (implemented in this change), which is valid for equality delete semantics because records with the same equality fields would always be routed to the same writer, storing all unique equality delete keys in the same delete file. I've responded in https://github.com/apache/iceberg/pull/14182#issuecomment-3337290281. In a nutshell, my code was doing exactly what you suggest here, committing all appends / deletes from a checkpoint, but it would additionally add WriteResults from other checkpoint if those did not contain delete files. I noticed that the code was too hard to understand, so I removed this optimization. > My suggestion is to combine both appends and deletes for the same checkpoint, which I implemented in this change. IMHO the solution you implemented goes beyond the scope of fixing the issue. I'm open to refactoring the aggregator / committer code, but I think we should add a set of minimal changes to fix the correctness issue, along side with increased test coverage. This is what I think #14182 achieves. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
