ggadon opened a new issue, #13651:
URL: https://github.com/apache/iceberg/issues/13651

   ### Apache Iceberg version
   
   None
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   Hey team,
   
   It seems like running snapshot expiration and `CREATE OR REPLACE TABLE` at 
the same time can cause table history corruption. The following happens:
   1. The base metadata, X, contains snapshots for expiration
   2. The replace operation loads the metadata X and uses it as base
   3. Snapshot expiration runs, creates metadata X+1, and commits it 
successfully, deleting old snapshots' data
   4. The replace operation finishes, and commits it successfully, with what 
seems to be X as its base, instead of X+1. It seems like the replace operation 
ignores commit conflicts during the replace operation
   
   ## Initial Observations
   
   If I'm looking at this correctly, the code that commits the replace 
transaction seems to have a logic that looks like ignores changes done to the 
base metadata, here:
   
   
https://github.com/apache/iceberg/blob/6bd6887db1f90674ca5e20e88cc95c5f92dcb050/core/src/main/java/org/apache/iceberg/BaseTransaction.java#L376-L380
   
   In our scenario, `current` seems to include the history before cleanup, 
while `base` after the lines above is actually the clean metadata without the 
snapshots that were just expired. When the commit occurs a few lines below, it 
seems like it commits the new metadata with the old history in it, causing the 
corruption.
   
   Am I reading this correctly? What was the logic behind this? Is it only for 
efficiency reasons since there is no real need to handle commit conflicts in 
these cases as all data files are replaced?
   
   ## On a sidenote
   I know there are active mailing list discussions around this 
([thread](https://lists.apache.org/thread/d4hzd4cfvopvckcfw50orqksjzymd4lm)), 
and also an issue that was closed recently about this subject (with 
@RussellSpitzer 's 
[comment](https://github.com/apache/iceberg/issues/12738#issuecomment-3009087235)
 about corrupted metadata). Would love to hear your thoughts about possible 
next steps here.
   
   Thanks!
   
   ### Willingness to contribute
   
   - [x] I can contribute a fix for this bug independently
   - [x] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to