amogh-jahagirdar commented on issue #7635: URL: https://github.com/apache/iceberg/issues/7635#issuecomment-1815363187
OK after some debugging and writing some local tests, what I'm seeing is the following: 1.) In certain cases, when there's branches the DELETE in Spark gets executed via the `DeleteFromTableExec` execution path. This goes through Iceberg's `DeleteFiles` API and expectedly fails the validation (since we cannot delete a file where some records may not match the condition). 2.) In other cases (deleting from main table state or even some cases in branches) goes through `ReplaceDataExec` which will go through Iceberg's `Overwrite` API which is writing out entirely new files and thus succeeds. I'm working on determining on codifying what exactly the difference is that leads to this different physical execution. It seems like a possible workaround in the interim to unblock deletions on branches could be to go through `MERGE INTO` path and self join on the deletion criteria because that seems to always go through the `ReplaceDataExec` path, but still need to validate this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org