pvary commented on PR #10935: URL: https://github.com/apache/iceberg/pull/10935#issuecomment-2302344676
> @pvary I have added `testFlinkScenario1` and `testFlinkScenario2` to `TestChangelogReader` based on your two scenarios. Please check the expected results. (I will rename the tests later with more descriptive names.) These are very great tests. Thanks for implementing them! > For scenario 1, I agree with you on what the changelog should emit. If DF1 is in snapshot 3, then we should emit the row with PK1 being deleted by ED2. (The row is deleted by ED1 too, but we should only emit the row once, not twice, for snapshot 3). Do I understand correctly that this is not yet the situation with the current code? > For scenario 2, I think that when a row in a data file is deleted by a positional delete in the same commit, that row should neither be shown as inserted nor as deleted. This is where I think we disagree. (IIUC, you expect to see it as deleted but not as inserted. To me, that would be inconsistent.) This part of scenario 2 is actually already tested by `testAddingAndDeletingInSameCommit`. I think we agree here. I'm perfectly fine if we can make sure that the added and immediately removed records are not emitted during the incremental scan read. > If you agree with my analysis, then my implementation does handle at least your two scenarios correctly. I think the output for snapshot3 in scenario1 is not correct. I have left a comment there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org