kevinjqliu opened a new issue, #3493: URL: https://github.com/apache/iceberg-python/issues/3493
`TruncateTransform.project` appears to incorrectly project `NOT STARTS WITH` predicates for truncated string/binary partition fields. For `truncate[2]`, PyIceberg currently projects: ```text NOT STARTS WITH "aaa" -> NOT STARTS WITH "aa" ``` That is unsafe: the truncated partition value does not contain enough information to prove all rows fail the original predicate, so files with matching rows can be pruned. Expected behavior should match apache/iceberg-go#1193 / Java truncate projection behavior: - prefix length < truncate width: keep `NOT STARTS WITH` with the original literal - prefix length == truncate width: project to `!=` - prefix length > truncate width: no inclusive projection Relevant code: `pyiceberg/transforms.py` `_truncate_array`, plus the existing `test_projection_truncate_string_not_starts_with` expectation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
