Re: [I] Failure to delete with merge-on-read in branch and main branch [iceberg]

via GitHub Thu, 16 Nov 2023 13:45:41 -0800


amogh-jahagirdar commented on issue #7635:
URL: https://github.com/apache/iceberg/issues/7635#issuecomment-1815363187


   OK after some debugging and writing some local tests, what I'm seeing is the 
following:
   
   1.) In certain cases, when there's branches the DELETE in Spark gets 
executed via the `DeleteFromTableExec` execution path. This goes through 
Iceberg's `DeleteFiles` API and expectedly fails the validation (since we 
cannot delete a file where some records may not match the condition).
   
   2.) In other cases (deleting from main table state or even some cases in 
branches)  goes through `ReplaceDataExec` which will go through Iceberg's 
`Overwrite` API which is writing out entirely new files and thus succeeds.
   
   I'm working on determining on codifying what exactly the difference is that 
leads to this different physical execution. 
   
   It seems like a possible workaround in the interim to unblock deletions on 
branches could be to go through `MERGE INTO` path and self join on the deletion 
criteria because that seems to always go through the `ReplaceDataExec` path, 
but still need to validate this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Failure to delete with merge-on-read in branch and main branch [iceberg]

Reply via email to