hantangwangd opened a new issue, #10982:
URL: https://github.com/apache/iceberg/issues/10982

   ### Apache Iceberg version
   
   1.6.0 (latest release)
   
   ### Query engine
   
   None
   
   ### Please describe the bug 🐞
   
   When using the policy of `IncrementalFileCleanup` to expire a specified 
snapshot, the data files that deleted by this snapshot will be physically 
deleted from the disk. However, these data files may be referenced by some 
earlier snapshots. So after that, when we want to use these earlier snapshots 
to do a `Time Travel` query, we will encounter the problem that some data files 
do not exist.
   
   For example, if we have a table with actions as follows:
   
   ```
   table.newAppend().appendFile(FILE_A).commit; // snapshotID1
   table.newDelete().deleteFile(FILE_A).commit; // snapshotID2
   table.newAppend().appendFile(FILE_A2).commit;        // snapshotID3
   ```
   
   Now we want to expire the specified snapshot `snapshotID2` using the policy 
of `IncrementalFileCleanup`:
   
   ```
   ((RemoveSnapshots) table.expireSnapshots())
           .withIncrementalCleanup(true)
           .expireSnapshotId(snapshotID2)
           .cleanExpiredFiles(true)
           .commit();
   ```
   
   After that, we will find `FILE_A` has been physically deleted from the disk. 
When we want to do a `Time Travel` query on `snapshotID1`, we will fail because 
`FILE_A` do not exist any more.
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [X] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to