wypoon commented on PR #11661:
URL: https://github.com/apache/iceberg/pull/11661#issuecomment-2828525455
@pvary I ran an existing benchmark,
`VectorizedReadDictionaryEncodedFlatParquetDataBenchmark`, which exercises the
`RLE` case (but not the `PACKED` case) of the refactored code. It does exercise
both arms of the if-else in
```
if (valuesReader instanceof ValuesAsBytesReader) {
nextRleBatch(...);
} else if (valuesReader instanceof
VectorizedDictionaryEncodedParquetValuesReader) {
nextRleDictEncodedBatch(...);
}
```
so the instanceof is being exercised.
I ran the benchmark on main (without this change) and on this branch after
rebasing on main.
The results are:
main:
```
Benchmark
Mode Cnt Score Error Units
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readBigDecimalsIcebergVectorized5k
ss 5 15.490 ± 1.897 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readBigDecimalsSparkVectorized5k
ss 5 15.988 ± 1.314 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readDatesIcebergVectorized5k
ss 5 5.979 ± 0.286 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readDatesSparkVectorized5k
ss 5 5.057 ± 0.501 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readDecimalsIcebergVectorized5k
ss 5 9.116 ± 1.352 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readDecimalsSparkVectorized5k
ss 5 8.738 ± 0.375 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readDoublesIcebergVectorized5k
ss 5 7.617 ± 0.522 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readDoublesSparkVectorized5k
ss 5 8.292 ± 1.026 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readFloatsIcebergVectorized5k
ss 5 4.818 ± 0.283 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readFloatsSparkVectorized5k
ss 5 4.069 ± 0.630 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readIntegersIcebergVectorized5k
ss 5 5.510 ± 0.249 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readIntegersSparkVectorized5k
ss 5 5.604 ± 0.933 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readLongsIcebergVectorized5k
ss 5 4.565 ± 0.253 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readLongsSparkVectorized5k
ss 5 4.604 ± 0.769 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readStringsIcebergVectorized5k
ss 5 6.674 ± 0.337 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readStringsSparkVectorized5k
ss 5 7.390 ± 1.092 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readTimestampsIcebergVectorized5k
ss 5 5.373 ± 0.351 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readTimestampsSparkVectorized5k
ss 5 4.855 ± 0.594 s/op
```
this branch:
```
Benchmark
Mode Cnt Score Error Units
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readBigDecimalsIcebergVectorized5k
ss 5 14.120 ± 0.898 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readBigDecimalsSparkVectorized5k
ss 5 14.878 ± 0.543 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readDatesIcebergVectorized5k
ss 5 4.006 ± 0.311 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readDatesSparkVectorized5k
ss 5 4.965 ± 1.272 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readDecimalsIcebergVectorized5k
ss 5 4.976 ± 0.847 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readDecimalsSparkVectorized5k
ss 5 5.509 ± 0.935 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readDoublesIcebergVectorized5k
ss 5 5.200 ± 0.201 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readDoublesSparkVectorized5k
ss 5 5.049 ± 0.617 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readFloatsIcebergVectorized5k
ss 5 4.910 ± 0.282 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readFloatsSparkVectorized5k
ss 5 4.272 ± 1.881 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readIntegersIcebergVectorized5k
ss 5 5.431 ± 0.137 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readIntegersSparkVectorized5k
ss 5 4.450 ± 1.899 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readLongsIcebergVectorized5k
ss 5 4.161 ± 0.219 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readLongsSparkVectorized5k
ss 5 4.633 ± 0.874 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readStringsIcebergVectorized5k
ss 5 6.038 ± 0.269 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readStringsSparkVectorized5k
ss 5 7.911 ± 0.378 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readTimestampsIcebergVectorized5k
ss 5 5.517 ± 0.400 s/op
VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readTimestampsSparkVectorized5k
ss 5 5.087 ± 0.811 s/op
```
The refactor does not appear to make the performance worse.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]