Fokko commented on PR #582: URL: https://github.com/apache/iceberg-python/pull/582#issuecomment-2043480905
> If we wanted to handle the validation only in the delete function by checking if we would end up rewriting files, above pattern would succeed by deleting level = 'INFO' and dt = '2024-02-01' because this deletion is a pure metadata operations. This is what I tried to explain in the comment earlier above: sometimes you have to do rewrites because there was a different partitioning strategy before where still some rows match. I'm adding that currently in https://github.com/apache/iceberg-python/pull/569. A table can have older manifests that are still written using an older partition spec. > Static overwrite on the other hand, would eagerly validate the predicate expression against the table schema, and the values in the arrow table and throw instead. I missed this part. We can add it, but I would say that it is up to the user. To simplify it, this means doing this additional check (pseudocode): ```python def overwrite(df: pa.Table, overwrite_filter: Union[str, BooleanExpression]) -> None: row_filter = _parse_row_filter(row_filter) # Turns the str into a boolean expression pa_row_filter = expression_to_pyarrow(row_filter) num_invalid_rows = len(df) - df.filter(pa_row_filter) if len(num_invalid_rows) > 0: raise ValueError(f"Found {num_invalid_rows} rows that don't match the overwrite predicate") ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org