wypoon commented on PR #11661: URL: https://github.com/apache/iceberg/pull/11661#issuecomment-2828525455
@pvary I ran an existing benchmark, `VectorizedReadDictionaryEncodedFlatParquetDataBenchmark`, which exercises the `RLE` case (but not the `PACKED` case) of the refactored code. It does exercise both arms of the if-else in ``` if (valuesReader instanceof ValuesAsBytesReader) { nextRleBatch(...); } else if (valuesReader instanceof VectorizedDictionaryEncodedParquetValuesReader) { nextRleDictEncodedBatch(...); } ``` so the instanceof is being exercised. I ran the benchmark on main (without this change) and on this branch after rebasing on main. The results are: main: ``` Benchmark Mode Cnt Score Error Units VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readBigDecimalsIcebergVectorized5k ss 5 15.490 ± 1.897 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readBigDecimalsSparkVectorized5k ss 5 15.988 ± 1.314 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readDatesIcebergVectorized5k ss 5 5.979 ± 0.286 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readDatesSparkVectorized5k ss 5 5.057 ± 0.501 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readDecimalsIcebergVectorized5k ss 5 9.116 ± 1.352 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readDecimalsSparkVectorized5k ss 5 8.738 ± 0.375 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readDoublesIcebergVectorized5k ss 5 7.617 ± 0.522 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readDoublesSparkVectorized5k ss 5 8.292 ± 1.026 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readFloatsIcebergVectorized5k ss 5 4.818 ± 0.283 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readFloatsSparkVectorized5k ss 5 4.069 ± 0.630 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readIntegersIcebergVectorized5k ss 5 5.510 ± 0.249 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readIntegersSparkVectorized5k ss 5 5.604 ± 0.933 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readLongsIcebergVectorized5k ss 5 4.565 ± 0.253 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readLongsSparkVectorized5k ss 5 4.604 ± 0.769 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readStringsIcebergVectorized5k ss 5 6.674 ± 0.337 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readStringsSparkVectorized5k ss 5 7.390 ± 1.092 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readTimestampsIcebergVectorized5k ss 5 5.373 ± 0.351 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readTimestampsSparkVectorized5k ss 5 4.855 ± 0.594 s/op ``` this branch: ``` Benchmark Mode Cnt Score Error Units VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readBigDecimalsIcebergVectorized5k ss 5 14.120 ± 0.898 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readBigDecimalsSparkVectorized5k ss 5 14.878 ± 0.543 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readDatesIcebergVectorized5k ss 5 4.006 ± 0.311 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readDatesSparkVectorized5k ss 5 4.965 ± 1.272 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readDecimalsIcebergVectorized5k ss 5 4.976 ± 0.847 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readDecimalsSparkVectorized5k ss 5 5.509 ± 0.935 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readDoublesIcebergVectorized5k ss 5 5.200 ± 0.201 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readDoublesSparkVectorized5k ss 5 5.049 ± 0.617 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readFloatsIcebergVectorized5k ss 5 4.910 ± 0.282 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readFloatsSparkVectorized5k ss 5 4.272 ± 1.881 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readIntegersIcebergVectorized5k ss 5 5.431 ± 0.137 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readIntegersSparkVectorized5k ss 5 4.450 ± 1.899 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readLongsIcebergVectorized5k ss 5 4.161 ± 0.219 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readLongsSparkVectorized5k ss 5 4.633 ± 0.874 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readStringsIcebergVectorized5k ss 5 6.038 ± 0.269 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readStringsSparkVectorized5k ss 5 7.911 ± 0.378 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readTimestampsIcebergVectorized5k ss 5 5.517 ± 0.400 s/op VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readTimestampsSparkVectorized5k ss 5 5.087 ± 0.811 s/op ``` The refactor does not appear to make the performance worse. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org