singhpk234 opened a new pull request, #5888:
URL: https://github.com/apache/iceberg/pull/5888

   ### About the change : 
   Revert compaction in case a conflict is detected with the non compaction 
writes.
   
   * Tag / mark the snapshot which was created as part of the compaction 
process (can use snapshot summary and introduce a flag is_compaction), so that 
it can be used for identification.
   While committing the pending updates of the transaction, check if the 
current snapshot conflicts with the transaction updates.
   * If the current snapshot conflicts and it’s a snapshot created by the 
compaction process (check snapshot summary key exists). Then revert the current 
snapshot (rollback to its parent snapshot) and then try re-applying updates on 
top of it.
   Introduce a new table property `rollback.compaction.on-conflicts.enabled`, 
which essentially helps in identifying if we want to rollback the compaction 
commit if conflicts are detected.
   
   * Let’s say a transaction has U1, U2, U3 ... update to be applied, let say B 
is the base snapshot, now when we were going to actually commit the transaction 
it saw the current snapshot now is B`(a snapshot created due to compaction), so 
make current snapshot as B` and try re-apply updates i.e {U1, U2, U3}, when 
applying U3 on top of { B` -> U1 -> U2 } a metadata conflict was detected, then 
try updates of transactions by rolling back B` to it’s parent, Let's say 
rollback B` to its parent is called RollbackB` so now try applying {RollbackB`, 
U1, U2, U3} on top of B` and see if it still conflicts if yes fail, else commit 
and update table metadata.
   
   Based on proposal : 
https://docs.google.com/document/d/1pSqxf5A59J062j9VFF5rcCpbW9vdTbBKTmjps80D-B0/edit#
   
   Adding this here to get a feedback on the approach. Once agreed upon, can 
make engines (for ex: spark) use transaction api.
   
   -----
   
   ### Testing done : 
   - Added ut's for validating is_compacted is present in compacted snapshot 
summary
   - Added ut for e2e using transaction API and concurrent rewrite.
   - TODO add more exhaustive UT's 
   
   
   cc @rdblue @danielcweeks @jackye1995 @rajarshisarkar @amogh-jahagirdar 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to