RussellSpitzer commented on issue #12467: URL: https://github.com/apache/iceberg/issues/12467#issuecomment-2707097097
@sfc-gh-ygu It has to apply both in Copy on Write. So imagine I have a Data File *data.parquet* |X| |--| |1 | |2| |3| |4| And equality Delete *eq.parquet* |x| |-| |3| When I do a scan of this file without a filter I make a scan task ``` { file = data.parquet deleteList = eq.parquet } ``` The problem comes when I apply a filter, So say I do `DELETE WHERE x = 2` This produces a scan with a filter pushdown of `x = 2` which is used in `table.scan.filter` The filter condition is then checked against `eq.parquet` which has a min and max for `x` of 3. Since we know `x = 2` we get a "CAN NOT MATCH" and ignore `eq.delete`. So I produce a scan task that looks like ``` { file = data.parquet deleteList = nil } ``` This scan task goes through the COW execution path which performs `DELETE WHERE x = 2` to every row but since we are in COW rows that are not deleted are shunted into a new file. Here we have a problem because we aren't applying the equality deletes so we write a new file *data_2.parquet* |X| |--| |1 | |3| |4| -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org