aokolnychyi commented on code in PR #7692:
URL: https://github.com/apache/iceberg/pull/7692#discussion_r1203260409
##########
spark/v3.4/spark/src/jmh/java/org/apache/iceberg/spark/source/WritersBenchmark.java:
##########
@@ -363,6 +389,60 @@ public void
writeUnpartitionedClusteredPositionDeleteWriter(Blackhole blackhole)
blackhole.consume(writer);
}
+ @Benchmark
+ @Threads(1)
+ public void writeUnpartitionedFanoutPositionDeleteWriter(Blackhole
blackhole) throws IOException {
+ FileIO io = table().io();
+
+ OutputFileFactory fileFactory = newFileFactory();
+ SparkFileWriterFactory writerFactory =
+
SparkFileWriterFactory.builderFor(table()).dataFileFormat(fileFormat()).build();
+
+ FanoutPositionOnlyDeleteWriter<InternalRow> writer =
+ new FanoutPositionOnlyDeleteWriter<>(
+ writerFactory, fileFactory, io, TARGET_FILE_SIZE_IN_BYTES);
+
+ PositionDelete<InternalRow> positionDelete = PositionDelete.create();
+ try (FanoutPositionOnlyDeleteWriter<InternalRow> closeableWriter = writer)
{
+ for (InternalRow row : positionDeleteRows) {
+ String path = row.getString(0);
+ long pos = row.getLong(1);
+ positionDelete.set(path, pos, null);
+ closeableWriter.write(positionDelete, unpartitionedSpec, null);
+ }
+ }
+
+ blackhole.consume(writer);
+ }
+
+ @Benchmark
+ @Threads(1)
+ public void writeUnpartitionedFanoutPositionDeleteWriterShuffled(Blackhole
blackhole)
Review Comment:
We should expect 5-15% overhead for the new buffering writer, which can
still be beneficial for the job if we skip local ordering for inserts and
potentially avoid spilling. This benchmark also does not take into account the
cost to order records, it only tests the write performance. We will use this
writer only if fanout is enabled. We should also explore Puffin delete files
that would persist bitmaps directly.
```
Benchmark
Mode Cnt Score Error
Units
ParquetWritersBenchmark.writeUnpartitionedClusteredPositionDeleteWriter
ss 5 6.004 ± 0.185
s/op
ParquetWritersBenchmark.writeUnpartitionedFanoutPositionDeleteWriter
ss 5 6.503 ± 0.171
s/op
ParquetWritersBenchmark.writeUnpartitionedFanoutPositionDeleteWriterShuffled
ss 5 6.616 ± 0.204
s/op
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]