RussellSpitzer commented on issue #12467:
URL: https://github.com/apache/iceberg/issues/12467#issuecomment-2707097097

   @sfc-gh-ygu
   
   It has to apply both in Copy on Write.
   
   So imagine I have a Data File
   
   *data.parquet*
   |X|
   |--|
   |1 |
   |2|
   |3|
   |4|
   
   
   And equality Delete
   *eq.parquet*
   |x|
   |-|
   |3|
   
   When I do a scan of this file without a filter I make a scan task
   
   ```
   { 
     file = data.parquet
     deleteList = eq.parquet
   }
   ```
   
   The problem comes when I apply a filter,
   
   So say I do `DELETE WHERE x = 2`
   
   This produces a scan with a filter pushdown of `x = 2` which is used in 
`table.scan.filter`
   
   The filter condition is then checked against `eq.parquet` which has a min 
and max for `x` of 3. Since we know `x = 2` we get a "CAN NOT MATCH" and ignore 
`eq.delete`. 
   
   So I produce a scan task that looks like
   
   
   ```
   { 
     file = data.parquet
     deleteList = nil
   }
   ```
   
   This scan task goes through the COW execution path which performs `DELETE 
WHERE x = 2` to every row but since we are in COW rows that are not deleted are 
shunted into a new file. Here we have a problem because we aren't applying the 
equality deletes so we write a new file
   
   *data_2.parquet*
   |X|
   |--|
   |1 |
   |3|
   |4|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to