paul-bormans-pcgw opened a new issue, #11687: URL: https://github.com/apache/iceberg/issues/11687
### Query engine 1. PyIceberg 2. Trino ### Question I'm running a test (on docker-compose) where new data is appended (FastAppend) every +/- 1 second while on the other end Trino runs a query to DELETE data older than 2 hrs. The latter throws an exception like so: ``` trino:ts> DELETE FROM pack WHERE epoch_timestamp_tz <= timestamp '2024-12-02 12:30 UTC' AND timestampns < 1.7331426374286223e+18; Query 20241202_145700_00052_2bt64, FAILED, 1 node Splits: 557 total, 556 done (99.82%) 29.45 [21.6M rows, 206MiB] [734K rows/s, 7.01MiB/s] Query 20241202_145700_00052_2bt64 failed: Failed to commit the transaction during write: Found conflicting files that can contain records matching true: [ s3://demobucket/ts.db/pack/data/source_id=s00000/epoch_hours=2024-12-02-14/00000-0-076ac96f-51b3-48d3-9a68-c7f971278ada.parquet, s3://demobucket/ts.db/pack/data/source_id=s00000/epoch_hours=2024-12-02-14/00000-0-92c59e44-c8db-4f99-a9bb-f0a2d4fbc164.parquet] Caused by: org.apache.iceberg.exceptions.ValidationException: Found conflicting files that can contain records matching true: [s3://demobucket/ts.db/pack/data/source_id=s00000/epoch_hours=2024-12-02-14/00000-0-62caa502-27ad-4f0c-aabf-41d1bb3198fa.parquet] at org.apache.iceberg.MergingSnapshotProducer.validateAddedDataFiles(MergingSnapshotProducer.java:347) at org.apache.iceberg.BaseRowDelta.validate(BaseRowDelta.java:130) ``` Now as can be seen I'm using a PartitionSpec: ``` with table.update_spec() as update: update.add_field( source_column_name="epoch_timestamp_tz", transform=HourTransform(), partition_field_name="epoch_hours", ) ``` Since the ingestion only appends new data, no new datafiles are added to the partition (epoch_hours=2024-12-02-12) where DELETE is running. Why do i still get this exception? Do i need to push any additional configuration to configure "data conflict filters"? Some guidance and/or best practices is appreciated. Paul -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org