hantangwangd opened a new issue, #10982: URL: https://github.com/apache/iceberg/issues/10982
### Apache Iceberg version 1.6.0 (latest release) ### Query engine None ### Please describe the bug 🐞 When using the policy of `IncrementalFileCleanup` to expire a specified snapshot, the data files that deleted by this snapshot will be physically deleted from the disk. However, these data files may be referenced by some earlier snapshots. So after that, when we want to use these earlier snapshots to do a `Time Travel` query, we will encounter the problem that some data files do not exist. For example, if we have a table with actions as follows: ``` table.newAppend().appendFile(FILE_A).commit; // snapshotID1 table.newDelete().deleteFile(FILE_A).commit; // snapshotID2 table.newAppend().appendFile(FILE_A2).commit; // snapshotID3 ``` Now we want to expire the specified snapshot `snapshotID2` using the policy of `IncrementalFileCleanup`: ``` ((RemoveSnapshots) table.expireSnapshots()) .withIncrementalCleanup(true) .expireSnapshotId(snapshotID2) .cleanExpiredFiles(true) .commit(); ``` After that, we will find `FILE_A` has been physically deleted from the disk. When we want to do a `Time Travel` query on `snapshotID1`, we will fail because `FILE_A` do not exist any more. ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [X] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org