paul-bormans-pcgw opened a new issue, #11687:
URL: https://github.com/apache/iceberg/issues/11687

   ### Query engine
   
   1. PyIceberg
   2. Trino
   
   ### Question
   
   I'm running a test (on docker-compose) where new data is appended 
(FastAppend) every +/- 1 second while on the other end Trino runs a query to 
DELETE data older than 2 hrs.
   
   The latter throws an exception like so:
   ```
   trino:ts> DELETE FROM pack WHERE epoch_timestamp_tz <= timestamp '2024-12-02 
12:30 UTC' AND timestampns < 1.7331426374286223e+18;
   
   Query 20241202_145700_00052_2bt64, FAILED, 1 node
   Splits: 557 total, 556 done (99.82%)
   29.45 [21.6M rows, 206MiB] [734K rows/s, 7.01MiB/s]
   
   Query 20241202_145700_00052_2bt64 failed: Failed to commit the transaction 
during write: Found conflicting files that can contain records matching true: [
        
s3://demobucket/ts.db/pack/data/source_id=s00000/epoch_hours=2024-12-02-14/00000-0-076ac96f-51b3-48d3-9a68-c7f971278ada.parquet,
 
        
s3://demobucket/ts.db/pack/data/source_id=s00000/epoch_hours=2024-12-02-14/00000-0-92c59e44-c8db-4f99-a9bb-f0a2d4fbc164.parquet]
   
   Caused by: org.apache.iceberg.exceptions.ValidationException: Found 
conflicting files that can contain records matching true: 
        
[s3://demobucket/ts.db/pack/data/source_id=s00000/epoch_hours=2024-12-02-14/00000-0-62caa502-27ad-4f0c-aabf-41d1bb3198fa.parquet]
        at 
org.apache.iceberg.MergingSnapshotProducer.validateAddedDataFiles(MergingSnapshotProducer.java:347)
        at org.apache.iceberg.BaseRowDelta.validate(BaseRowDelta.java:130)
   ```
   
   Now as can be seen I'm using a PartitionSpec:
   ```
               with table.update_spec() as update:
                   update.add_field(
                       source_column_name="epoch_timestamp_tz",
                       transform=HourTransform(),
                       partition_field_name="epoch_hours",
                   )
   ```
   
   Since the ingestion only appends new data, no new datafiles are added to the 
partition (epoch_hours=2024-12-02-12) where DELETE is running. Why do i still 
get this exception?
   
   Do i need to push any additional configuration to configure "data conflict 
filters"?
   
   Some guidance and/or best practices is appreciated.
   
   Paul
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to