Fokko commented on PR #582:
URL: https://github.com/apache/iceberg-python/pull/582#issuecomment-2043480905

   > If we wanted to handle the validation only in the delete function by 
checking if we would end up rewriting files, above pattern would succeed by 
deleting level = 'INFO' and dt = '2024-02-01' because this deletion is a pure 
metadata operations.
   
   This is what I tried to explain in the comment earlier above: sometimes you 
have to do rewrites because there was a different partitioning strategy before 
where still some rows match. I'm adding that currently in 
https://github.com/apache/iceberg-python/pull/569. A table can have older 
manifests that are still written using an older partition spec.
   
   > Static overwrite on the other hand, would eagerly validate the predicate 
expression against the table schema, and the values in the arrow table and 
throw instead.
   
   I missed this part. We can add it, but I would say that it is up to the 
user. To simplify it, this means doing this additional check (pseudocode):
   
   ```python
   def overwrite(df: pa.Table, overwrite_filter: Union[str, BooleanExpression]) 
-> None:
       row_filter = _parse_row_filter(row_filter) # Turns the str into a 
boolean expression    
       pa_row_filter = expression_to_pyarrow(row_filter)
       num_invalid_rows = len(df) - df.filter(pa_row_filter)
       if len(num_invalid_rows) > 0:
           raise ValueError(f"Found {num_invalid_rows} rows that don't match 
the overwrite predicate")
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to