singhpk234 opened a new pull request, #5888: URL: https://github.com/apache/iceberg/pull/5888
### About the change : Revert compaction in case a conflict is detected with the non compaction writes. * Tag / mark the snapshot which was created as part of the compaction process (can use snapshot summary and introduce a flag is_compaction), so that it can be used for identification. While committing the pending updates of the transaction, check if the current snapshot conflicts with the transaction updates. * If the current snapshot conflicts and it’s a snapshot created by the compaction process (check snapshot summary key exists). Then revert the current snapshot (rollback to its parent snapshot) and then try re-applying updates on top of it. Introduce a new table property `rollback.compaction.on-conflicts.enabled`, which essentially helps in identifying if we want to rollback the compaction commit if conflicts are detected. * Let’s say a transaction has U1, U2, U3 ... update to be applied, let say B is the base snapshot, now when we were going to actually commit the transaction it saw the current snapshot now is B`(a snapshot created due to compaction), so make current snapshot as B` and try re-apply updates i.e {U1, U2, U3}, when applying U3 on top of { B` -> U1 -> U2 } a metadata conflict was detected, then try updates of transactions by rolling back B` to it’s parent, Let's say rollback B` to its parent is called RollbackB` so now try applying {RollbackB`, U1, U2, U3} on top of B` and see if it still conflicts if yes fail, else commit and update table metadata. Based on proposal : https://docs.google.com/document/d/1pSqxf5A59J062j9VFF5rcCpbW9vdTbBKTmjps80D-B0/edit# Adding this here to get a feedback on the approach. Once agreed upon, can make engines (for ex: spark) use transaction api. ----- ### Testing done : - Added ut's for validating is_compacted is present in compacted snapshot summary - Added ut for e2e using transaction API and concurrent rewrite. - TODO add more exhaustive UT's cc @rdblue @danielcweeks @jackye1995 @rajarshisarkar @amogh-jahagirdar -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org