zhongyujiang commented on issue #9149: URL: https://github.com/apache/iceberg/issues/9149#issuecomment-1825831844
Based on this idea I did some tuning and [benchmarking](https://github.com/apache/iceberg/commit/39b5939980f2d777735ec822d8bf211f26489c1b#diff-9eb38b06454a156e92135fab05e90955ac41618758a27192859b8aa243fe1edaR66), and here are the results: >`numDataFiles` is the number of data files written to the table, each data file has 10 million records. 15% of the rows in each data file were deleted by pos deletes and all pos deletes are writen to one pos delete file with different row-group sizes Benchmark (numDataFiles) (rowGroupSizeMB) Mode Cnt Score Error Units IcebergSourceParquetPosDeleteRowGroupFilterBenchmark.readIcebergVectorized 3 1 ss 5 5.062 ± 0.117 s/op IcebergSourceParquetPosDeleteRowGroupFilterBenchmark.readIcebergVectorized 3 2 ss 5 5.404 ± 0.083 s/op IcebergSourceParquetPosDeleteRowGroupFilterBenchmark.readIcebergVectorized 3 4 ss 5 6.004 ± 0.103 s/op IcebergSourceParquetPosDeleteRowGroupFilterBenchmark.readIcebergVectorized 3 128 ss 5 7.435 ± 0.703 s/op IcebergSourceParquetPosDeleteRowGroupFilterBenchmark.readIcebergVectorized 5 1 ss 5 8.413 ± 0.081 s/op IcebergSourceParquetPosDeleteRowGroupFilterBenchmark.readIcebergVectorized 5 2 ss 5 9.111 ± 0.139 s/op IcebergSourceParquetPosDeleteRowGroupFilterBenchmark.readIcebergVectorized 5 4 ss 5 9.631 ± 0.168 s/op IcebergSourceParquetPosDeleteRowGroupFilterBenchmark.readIcebergVectorized 5 128 ss 5 16.715 ± 0.409 s/op The results show that using a smaller row-group in pos-delete can bring some read performance improvements compared to the current default. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org