zhongyujiang commented on issue #9149:
URL: https://github.com/apache/iceberg/issues/9149#issuecomment-1825831844

   Based on this idea I did some tuning and 
[benchmarking](https://github.com/apache/iceberg/commit/39b5939980f2d777735ec822d8bf211f26489c1b#diff-9eb38b06454a156e92135fab05e90955ac41618758a27192859b8aa243fe1edaR66),
 and here are the results:
   >`numDataFiles` is the number of data files written to the table, each data 
file has 10 million records. 15% of the rows in each data file were deleted by 
pos deletes and all pos deletes are writen to one pos delete file with 
different row-group sizes
   
   Benchmark                                                                    
                                            (numDataFiles)  (rowGroupSizeMB)  
Mode  Cnt   Score   Error  Units
   IcebergSourceParquetPosDeleteRowGroupFilterBenchmark.readIcebergVectorized   
            3                 1    ss    5   5.062 ± 0.117   s/op
   IcebergSourceParquetPosDeleteRowGroupFilterBenchmark.readIcebergVectorized   
            3                 2    ss    5   5.404 ± 0.083   s/op
   IcebergSourceParquetPosDeleteRowGroupFilterBenchmark.readIcebergVectorized   
            3                 4    ss    5   6.004 ± 0.103   s/op
   IcebergSourceParquetPosDeleteRowGroupFilterBenchmark.readIcebergVectorized   
            3               128    ss    5   7.435 ± 0.703   s/op
   IcebergSourceParquetPosDeleteRowGroupFilterBenchmark.readIcebergVectorized   
            5                 1    ss    5   8.413 ± 0.081   s/op
   IcebergSourceParquetPosDeleteRowGroupFilterBenchmark.readIcebergVectorized   
            5                 2    ss    5   9.111 ± 0.139   s/op
   IcebergSourceParquetPosDeleteRowGroupFilterBenchmark.readIcebergVectorized   
            5                 4    ss    5   9.631 ± 0.168   s/op
   IcebergSourceParquetPosDeleteRowGroupFilterBenchmark.readIcebergVectorized   
            5               128    ss    5  16.715 ± 0.409   s/op
   
   The results show that using a smaller row-group in pos-delete can bring some 
read performance improvements compared to the current default.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to