pvary commented on issue #10431: URL: https://github.com/apache/iceberg/issues/10431#issuecomment-2157551451
Here is how the different deletes work: - EQ-DELETE - removes all occurrences of the record with the given id BEFORE the snapshot - POS-DELETE - removes a given row from the given data file BEFORE OR IN the given snapshot. Based on the data you have shown, I would guess that there were 2 updates for the given record in the snapshot. Flink does the following: - In upsert mode, it always writes the EQ-DELETE for the id, and writes the new record. Also it stores the filename and the id for all written record in memory. - If there is a new update for an id which arrived in the same snapshot, then it also writes the positional delete file. Based on the commit message you have shown, it tells us, that it has written 3 delete files - I expect that one of the delete files is the positional delete. It should contain the delete record for the given row. Could you please check it this is the case? Thanks, Peter -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org