danielcweeks commented on code in PR #14264:
URL: https://github.com/apache/iceberg/pull/14264#discussion_r2535724821


##########
core/src/main/java/org/apache/iceberg/BaseIncrementalChangelogScan.java:
##########
@@ -89,7 +121,32 @@ protected CloseableIterable<ChangelogScanTask> doPlanFiles(
       manifestGroup = manifestGroup.planWith(planExecutor());
     }
 
-    return manifestGroup.plan(new 
CreateDataFileChangeTasks(changelogSnapshots));
+    // Create a supplier that reuses already-built index or builds lazily when 
first DELETED entry
+    // is encountered
+    Supplier<DeleteFileIndex> existingDeleteIndexSupplier =
+        () -> {
+          if (cachedExistingDeleteIndex != null) {
+            return cachedExistingDeleteIndex;
+          }
+          return buildExistingDeleteIndexTracked(fromSnapshotIdExclusive);
+        };
+
+    // Plan data file tasks (ADDED and DELETED)
+    CloseableIterable<ChangelogScanTask> dataFileTasks =
+        manifestGroup.plan(
+            new CreateDataFileChangeTasks(
+                changelogSnapshots,
+                existingDeleteIndexSupplier,
+                addedDeletesBySnapshot,
+                table().specs(),
+                isCaseSensitive()));
+
+    // Find EXISTING data files affected by newly added delete files and 
create tasks for them
+    CloseableIterable<ChangelogScanTask> deletedRowsTasks =
+        planDeletedRowsTasks(
+            changelogSnapshots, existingDeleteIndex, addedDeletesBySnapshot, 
changelogSnapshotIds);
+
+    return CloseableIterable.concat(ImmutableList.of(dataFileTasks, 
deletedRowsTasks));

Review Comment:
   It's a little unclear to me whether we're observing the correct orderedness 
by just concatenating these together.  I'm not sure what the expectations are.  
When we calculate the snapshot range, it's explicit that we preserve order, but 
here we don't maintain that guarantee.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to