hantangwangd opened a new issue, #15128: URL: https://github.com/apache/iceberg/issues/15128
### Apache Iceberg version 1.10.1 (latest release) ### Query engine None ### Please describe the bug 🐞 Executing the following statements in Spark (on Iceberg) leads to a mismatch between the actual and expected query results: ``` CREATE TABLE test_table (id bigint NOT NULL, data binary) USING iceberg PARTITIONED BY (data); INSERT INTO TABLE test_table VALUES(1, X'e3bcd1'), (2, X'bcd1'); DELETE FROM test_table WHERE data = X'bcd1'; SELECT * FROM %s where data = X'e3bcd1'; ``` The expected result is the remaining data row, but the query returns empty. Upon investigation, this is because the partition bounds for the binary column in the newly generated manifest file are computed incorrectly, causing the corresponding data file to be pruned during the planning phase. PrestoDB encounters the same issue when using `DeleteFiles.deleteFromRowFilter` to support file-level deletion. To dig deeper, the root cause is that, when calling `DeleteFiles.deleteFromRowFilter`, the `PartitionFieldStats`'s min/max fields directly reference a reusable byte array. Specifically, this array can be (and is) reused by the `ManifestReader` when processing multiple files. ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
