toien opened a new issue, #10907: URL: https://github.com/apache/iceberg/issues/10907
### Query engine Spark SQL on AWS EMR(7.1.0) Versions: - Spark: 3.5.0 - Iceberg: 1.4.3 - Flink: 1.18 (Managed Apache Flink of AWS) ### Question First i create an iceberg table like: ```sql spark-sql (test_db)> show create table my_catalog.test_db.dws_table; CREATE TABLE my_catalog.test_db.dws_table ( dt STRING NOT NULL, brand_code STRING NOT NULL, event_type STRING NOT NULL, sub_event_type STRING NOT NULL, success_count INT, failed_count INT) USING iceberg LOCATION 's3://xxx/test/test_db.db/dws_table' TBLPROPERTIES ( 'current-snapshot-id' = '3745013875610091505', 'format' = 'iceberg/parquet', 'format-format' = '2', 'format-version' = '2', 'identifier-fields' = '[dt,brand_code,sub_event_type,event_type]', 'write.metadata.delete-after-commit.enabled' = 'true', 'write.metadata.previous-versions-max' = '5', 'write.parquet.compression-codec' = 'zstd', 'write.upsert.enabled' = 'true') ``` Flink streaming jobs will calc results and upsert into this table. so that would create many snapshots by Flink checkpoints: ```sql spark-sql (test_db)> select COUNT(*) from my_catalog.test_db.dws_table.snapshots; 2130 ``` Here is the problem: When I use Spark SQL do `expire_snapshots`, It **DO cost time** to execute this job ```sql spark-sql (test_db)> CALL my_catalog.system.expire_snapshots( > table => 'test_db.dws_table', > retain_last => 5 > ); deleted_data_files_count deleted_position_delete_files_count deleted_equality_delete_files_count deleted_manifest_files_count deleted_manifest_lists_count deleted_statistics_files_count 0 0 0 0 0 0 Time taken: 45.336 seconds, Fetched 1 row(s) ``` But nothing been deleted! ```sql spark-sql (test_db)> select COUNT(*) from my_catalog.test_db.dws_table.snapshots; 2164 ``` And data on S3 still there. Spark Job finished successfully:  The same problem occurs when call `rewrite_data_files` **TOO**, small data files are **NOT** been compacted(merged). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org