amogh-jahagirdar commented on code in PR #14197:
URL: https://github.com/apache/iceberg/pull/14197#discussion_r2392046035


##########
core/src/main/java/org/apache/iceberg/io/BaseTaskWriter.java:
##########
@@ -140,7 +161,13 @@ protected BaseEqualityDeltaWriter(
           new SortingPositionOnlyDeleteWriter<>(
               () -> 
appenderFactory.newPosDeleteWriter(newOutputFile(partition), format, partition),
               deleteGranularity);
+      this.dvFileWriter =
+          null == partitioningDVWriter
+              ? new PartitioningDVWriter<>(fileFactory, p -> null)

Review Comment:
   This looks like we're not going to load historical position deletes/DVs to 
merge, at this point in the Flink write task. I'm not super familiar with the 
flink writes but I would think it's possible that there are historical position 
deletes or DVs from before the write that need to be merged (in the spec we 
guarantee at most 1 DV per data file)



##########
core/src/main/java/org/apache/iceberg/io/BaseTaskWriter.java:
##########
@@ -115,20 +115,32 @@ public WriteResult complete() throws IOException {
   protected abstract class BaseEqualityDeltaWriter implements Closeable {
     private final StructProjection structProjection;
     private final PositionDelete<T> positionDelete;
+    private final StructLike partitionKey;
+    private final boolean useDv;

Review Comment:
   In the spark position delta write logic, we abstract both pos delete/and the 
dv writer behind the `PartitioningWriter` interface. Is that not possible here? 
   
   
https://github.com/apache/iceberg/blob/main/spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaWrite.java#L519



##########
core/src/main/java/org/apache/iceberg/io/BaseTaskWriter.java:
##########


Review Comment:
   See my comment above, 
https://github.com/apache/iceberg/pull/14197/files#r2392028379 
   
   I don't quite understand why we can't use the shared PartitioningWriter 
interface for both the pos delete case and DV writer case?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to