bk-mz opened a new issue, #9833: URL: https://github.com/apache/iceberg/issues/9833
### Apache Iceberg version 1.4.3 (latest release) ### Query engine Spark ### Please describe the bug 🐞 Hey folks, we're using `rewrite_position_delete_files` to compact delete files. It keeps rewriting data but it does not compact anything, just rewrites files with same amount of data into same amount of files. ``` CALL glue.system.rewrite_position_delete_files(table => 'table_name', where => 'data_load_ts < current_timestamp() - INTERVAL 1 HOURS', options => map('partial-progress.enabled', 'true', 'rewrite-all', 'true', 'max-concurrent-file-group-rewrites', '50')) +----------------------------+------------------------+---------------------+-----------------+ |rewritten_delete_files_count|added_delete_files_count|rewritten_bytes_count|added_bytes_count| +----------------------------+------------------------+---------------------+-----------------+ |5474 |5232 |83456097 |82859000 | +----------------------------+------------------------+---------------------+-----------------+ CALL glue.system.rewrite_position_delete_files(table => 'table_name', where => 'data_load_ts < current_timestamp() - INTERVAL 1 HOURS', options => map('partial-progress.enabled', 'true', 'rewrite-all', 'true', 'max-concurrent-file-group-rewrites', '50')) +----------------------------+------------------------+---------------------+-----------------+ |rewritten_delete_files_count|added_delete_files_count|rewritten_bytes_count|added_bytes_count| +----------------------------+------------------------+---------------------+-----------------+ |5431 |5265 |83739802 |83200333 | +----------------------------+------------------------+---------------------+-----------------+ CALL glue.system.rewrite_position_delete_files(table => 'table_name', where => 'data_load_ts < current_timestamp() - INTERVAL 1 HOURS', options => map('partial-progress.enabled', 'true', 'rewrite-all', 'true', 'max-concurrent-file-group-rewrites', '50')) +----------------------------+------------------------+---------------------+-----------------+ |rewritten_delete_files_count|added_delete_files_count|rewritten_bytes_count|added_bytes_count| +----------------------------+------------------------+---------------------+-----------------+ |5443 |5244 |83643303 |83241939 | +----------------------------+------------------------+---------------------+-----------------+ ``` As a matter of fact I think it has created an odd partitions which contain only small delete files. I suspect what that job does is to keeps rewriting those small files all over again having same small files in the end. Normal partition on s3: `data_load_ts_hour=2024-02-29-06/` Odd partition: `data_load_ts_hour=474425/` There are a lot of those odd partitions. They have an integer which is incrementally increasing from `474425` till `474754`. I think each run creates a new odd partition. <img width="1121" alt="image" src="https://github.com/apache/iceberg/assets/892781/ac54600d-1ad8-4ee3-b17a-733dffbaaef5"> Odd partition contains only delete parquet files <img width="1156" alt="image" src="https://github.com/apache/iceberg/assets/892781/db1a2489-002b-4f5e-a86b-df7fcb90b2e5"> Can you check and confirm whether this is an issue? So far we had disabled `rewrite_position_delete_files` at all b/c the behavior is super-odd. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org