amogh-jahagirdar commented on code in PR #11561:
URL: https://github.com/apache/iceberg/pull/11561#discussion_r1863707596


##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java:
##########
@@ -168,7 +168,7 @@ protected Map<String, DeleteFileSet> rewritableDeletes() {
     for (ScanTask task : tasks()) {
       FileScanTask fileScanTask = task.asFileScanTask();
       for (DeleteFile deleteFile : fileScanTask.deletes()) {
-        if (ContentFileUtil.isFileScoped(deleteFile)) {
+        if (ContentFileUtil.isFileScoped(deleteFile) || 
ContentFileUtil.isDV(deleteFile)) {

Review Comment:
   Great catch, yes to both points. the file scoped Util correctly handles the 
DV case by checking the referenced data file, and we for DVs we do need to 
merge all existing position deletes, regardless of scope. I've fixed that and 
added a test case for update/delete/merge which will create a mix of file 
scoped and partition scoped deletes, and then finally produce a delete which 
has the positions merged from these existing deletes.



##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java:
##########
@@ -168,7 +168,7 @@ protected Map<String, DeleteFileSet> rewritableDeletes() {
     for (ScanTask task : tasks()) {
       FileScanTask fileScanTask = task.asFileScanTask();
       for (DeleteFile deleteFile : fileScanTask.deletes()) {
-        if (ContentFileUtil.isFileScoped(deleteFile)) {
+        if (ContentFileUtil.isFileScoped(deleteFile) || 
ContentFileUtil.isDV(deleteFile)) {

Review Comment:
   Great catch, yes to both points. the file scoped Util correctly handles the 
DV case by checking the referenced data file, and we for DVs we do need to 
merge all existing position deletes, regardless of scope. I've fixed that and 
added a test case for update/delete/merge which will create a mix of file 
scoped and partition scoped deletes, and then finally produce a DV which has 
the positions merged from these existing deletes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to