[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7692: Core: Add writer for unordered position deletes

via GitHub Wed, 24 May 2023 15:23:24 -0700


aokolnychyi commented on code in PR #7692:
URL: https://github.com/apache/iceberg/pull/7692#discussion_r1204814401



##########
spark/v3.4/spark/src/jmh/java/org/apache/iceberg/spark/source/WritersBenchmark.java:
##########
@@ -363,6 +389,60 @@ public void 
writeUnpartitionedClusteredPositionDeleteWriter(Blackhole blackhole)
     blackhole.consume(writer);
   }
 
+  @Benchmark
+  @Threads(1)
+  public void writeUnpartitionedFanoutPositionDeleteWriter(Blackhole 
blackhole) throws IOException {
+    FileIO io = table().io();
+
+    OutputFileFactory fileFactory = newFileFactory();
+    SparkFileWriterFactory writerFactory =
+        
SparkFileWriterFactory.builderFor(table()).dataFileFormat(fileFormat()).build();
+
+    FanoutPositionOnlyDeleteWriter<InternalRow> writer =
+        new FanoutPositionOnlyDeleteWriter<>(
+            writerFactory, fileFactory, io, TARGET_FILE_SIZE_IN_BYTES);
+
+    PositionDelete<InternalRow> positionDelete = PositionDelete.create();
+    try (FanoutPositionOnlyDeleteWriter<InternalRow> closeableWriter = writer) 
{
+      for (InternalRow row : positionDeleteRows) {
+        String path = row.getString(0);
+        long pos = row.getLong(1);
+        positionDelete.set(path, pos, null);
+        closeableWriter.write(positionDelete, unpartitionedSpec, null);
+      }
+    }
+
+    blackhole.consume(writer);
+  }
+
+  @Benchmark
+  @Threads(1)
+  public void writeUnpartitionedFanoutPositionDeleteWriterShuffled(Blackhole 
blackhole)

Review Comment:
   I ran this benchmark (100 data files, 50k deletes each, 5 million deletes 
total) with a GC profiler and did not see anything bad. Issues will arise when 
there are lots of unique data files. That's unlikely as we distribute by 
partition and this writer will still be disabled by default, so users will have 
to opt in explicitly. It ain't perfect for sure but there would be reasonable 
cases for it.



##########
spark/v3.4/spark/src/jmh/java/org/apache/iceberg/spark/source/WritersBenchmark.java:
##########
@@ -363,6 +389,60 @@ public void 
writeUnpartitionedClusteredPositionDeleteWriter(Blackhole blackhole)
     blackhole.consume(writer);
   }
 
+  @Benchmark
+  @Threads(1)
+  public void writeUnpartitionedFanoutPositionDeleteWriter(Blackhole 
blackhole) throws IOException {
+    FileIO io = table().io();
+
+    OutputFileFactory fileFactory = newFileFactory();
+    SparkFileWriterFactory writerFactory =
+        
SparkFileWriterFactory.builderFor(table()).dataFileFormat(fileFormat()).build();
+
+    FanoutPositionOnlyDeleteWriter<InternalRow> writer =
+        new FanoutPositionOnlyDeleteWriter<>(
+            writerFactory, fileFactory, io, TARGET_FILE_SIZE_IN_BYTES);
+
+    PositionDelete<InternalRow> positionDelete = PositionDelete.create();
+    try (FanoutPositionOnlyDeleteWriter<InternalRow> closeableWriter = writer) 
{
+      for (InternalRow row : positionDeleteRows) {
+        String path = row.getString(0);
+        long pos = row.getLong(1);
+        positionDelete.set(path, pos, null);
+        closeableWriter.write(positionDelete, unpartitionedSpec, null);
+      }
+    }
+
+    blackhole.consume(writer);
+  }
+
+  @Benchmark
+  @Threads(1)
+  public void writeUnpartitionedFanoutPositionDeleteWriterShuffled(Blackhole 
blackhole)

Review Comment:
   I ran this benchmark (100 data files, 50k deletes each, 5 million deletes 
total) with a GC profiler and did not see anything bad. Issues will arise when 
there are lots of unique data files. That's unlikely as we distribute by 
partition and this writer will still be disabled by default, so users will have 
to opt in explicitly. It isn't perfect for sure but there would be reasonable 
cases for it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7692: Core: Add writer for unordered position deletes

Reply via email to