lliangyu-lin opened a new pull request, #12132: URL: https://github.com/apache/iceberg/pull/12132
### Description Currently, Iceberg ```dropTableData()``` does not properly delete statistics files (```.stats```) that are replaced by newer statistics files. When ```updateStatistics()``` is called multiple times on the same snapshot, the older statistics files are removed from ```metadata.json``` but remain in storage, leading to orphaned files. As Issue #11876 points out, #9305 and #9409 remove only the latest Puffin file. Old Puffin files still remain. This PR attempts to address this issue by ensuring that all statistics and partition statistics files referenced in historical metadata are deleted by looking through all metadata files. An alternative solution would be to explicitly track old (replaced) statistics files in metadata files, but will require changing the metadata spec. ### Testing * ```./gradlew spotlessApply -DallModules``` passed * ```./gradlew build -x test -x integrationTest``` passed * ```./gradlew :iceberg-core:test --tests "org.apache.iceberg.hadoop.TestCatalogUtilDropTable"``` passed Closes #11876 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org