lliangyu-lin opened a new pull request, #12132:
URL: https://github.com/apache/iceberg/pull/12132

   ### Description
   Currently, Iceberg ```dropTableData()``` does not properly delete statistics 
files (```.stats```) that are replaced by newer statistics files. When 
```updateStatistics()``` is called multiple times on the same snapshot, the 
older statistics files are removed from ```metadata.json``` but remain in 
storage, leading to orphaned files.
   
   As Issue #11876 points out, #9305 and #9409 remove only the latest Puffin 
file. Old Puffin files still remain.
   This PR attempts to address this issue by ensuring that all statistics and 
partition statistics files referenced in historical metadata are deleted by 
looking through all metadata files.
   
   An alternative solution would be to explicitly track old (replaced) 
statistics files in metadata files, but will require changing the metadata spec.
   
   ### Testing
   * ```./gradlew spotlessApply -DallModules``` passed
   * ```./gradlew build -x test -x integrationTest``` passed
   * ```./gradlew :iceberg-core:test --tests 
"org.apache.iceberg.hadoop.TestCatalogUtilDropTable"``` passed
   
   Closes #11876


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to