yingjianwu98 opened a new pull request, #15592:
URL: https://github.com/apache/iceberg/pull/15592
`SparkTable.deleteWhere()` and `canDeleteWhere()` do not resolve the WAP
(Write-Audit-Publish) branch from the session configuration. When a user sets
`spark.wap.branch` and performs a metadata-level delete, both the scan and the
commit incorrectly target the main branch instead of the WAP branch.
For`deleteWhere()`, I think it's a recent regression from this
[PR](https://github.com/apache/iceberg/pull/15240/changes#diff-121afd53a70a40ede271b3a89ac46fc5d48c1db4aed42c2f44526895ee5f3245).
The new change doesn't resolve to the right branch in the new extension rule
for **Spark 4.1**. I don't see a way to resolve this issues in the
ResolveBranch Rule as I believe OptimizeMetadataOnlyDeleteFromTable optimizer
rule will discard ResolveBranch rule and trigger `deleteWhere()` which doesn't
respect the branch.
For `canDeleteWhere()`, this is an existing issue across **all spark
version** and I have been working on a fix here
https://github.com/apache/iceberg/pull/15512
This causes two issues:
- `canDeleteWhere()` scans the wrong branch's files, so it cannot
correctly determine whether the delete can be resolved at the metadata level
- `deleteWhere()` commits the delete to main instead of the WAP branch,
silently dropping the delete
Fix:
- Use determineReadBranch() in canDeleteWhere() to scan the correct branch
- Use determineWriteBranch() in deleteWhere() to commit to the correct
branch
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]