yadavay-amzn opened a new pull request, #16324: URL: https://github.com/apache/iceberg/pull/16324
## Problem Fixes #15487. When Flink TableMaintenance runs both `ExpireSnapshots` and `DeleteOrphanFiles`, manifest list files of live snapshots are incorrectly deleted as orphans, causing `NotFoundException` in subsequent `ExpireSnapshots` runs. ## Root cause `ListMetadataFiles` loads the table once at operator startup (`open()`) and never calls `table.refresh()` in `processElement()`. It only emits manifest list and manifest file paths for snapshots that existed when the Flink job started. Any snapshot added after job start has its metadata files missing from the "referenced" set that `DeleteOrphanFiles` uses. When those manifest lists are older than `minAge`, `OrphanFilesDetector` classifies them as orphans and `DeleteFilesProcessor` deletes them. On the next maintenance cycle, `ExpireSnapshots` tries to read those manifest lists in `IncrementalFileCleanup.cleanFiles()` and fails with `NotFoundException`. This explains why: - The bug only occurs with `DeleteOrphanFiles` enabled (it is the one incorrectly deleting the files) - The bug never occurs with `ExpireSnapshots` alone (it only deletes manifest lists of snapshots it has already expired and read) - The bug becomes more likely over time (more snapshots added after job start = more unprotected manifest lists) ## Fix Add `table.refresh()` at the top of `ListMetadataFiles.processElement()`, matching what `MetadataTablePlanner` already does. This ensures the "referenced" set always reflects the current table state. Applied to all Flink versions (v1.20, v2.0, v2.1). ## Generative AI Generated-by: Claude Opus 4.7 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
