yashmayya opened a new pull request, #13791: URL: https://github.com/apache/pinot/pull/13791
- Currently, a subset of single input aggregation functions support null handling (see [here](https://github.com/apache/pinot/blob/53fbf88027c47b3c6f7d1526576ba0bd257fe9d5/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/AggregationFunctionFactory.java#L70-L484)). - Out of these, only a few use the `NullableSingleInputAggregationFunction` framework added in https://github.com/apache/pinot/pull/12227. The ones that don't use the framework instead use a very inefficient way of checking nulls - iterating over all the values in the block and checking the null bitmap for every value. This leads to a performance degradation of 10-20X in most aggregation functions. - The `NullableSingleInputAggregationFunction` framework makes much more efficient use of the `RoaringBitmap` via the iterator API. - We intend to enable null handling by default for leaf stages in the multi-stage query engine (see https://github.com/apache/pinot/pull/13570). It's crucial to avoid such a large performance degradation by default. - This patch refactors the remaining nullable single input aggregation functions to use the faster framework and adds some representative benchmarks to demonstrate the performance improvement in the null handling enabled case (and also to show that there isn't any noticeable performance degradation in the null handling disabled case with these changes). - Using the `NullableSingleInputAggregationFunction` framework also results in a lot of code cleanup since many of the aggregation functions had a lot of redundant duplication between the null handling enabled and disabled paths. - The remaining single input aggregation functions that currently don't support null handling whatsoever will also be updated in a future patch. - A number of unit tests using the fluent test framework (added in https://github.com/apache/pinot/pull/12215) are also added here in order to verify the correctness of the changes especially w.r.t. null handling. <hr> These are the benchmark results on an M3 Max chip with Temurin JDK 17: ### SumAggregationFunction #### Old ``` Benchmark (_nullHandlingEnabled) (_nullPeriod) Mode Cnt Score Error Units BenchmarkSumAggregation.test false 1 thrpt 50 141.922 ± 1.030 ops/ms BenchmarkSumAggregation.test false 2 thrpt 50 142.837 ± 1.184 ops/ms BenchmarkSumAggregation.test false 4 thrpt 50 143.824 ± 0.684 ops/ms BenchmarkSumAggregation.test false 8 thrpt 50 143.524 ± 0.854 ops/ms BenchmarkSumAggregation.test false 16 thrpt 50 143.585 ± 0.907 ops/ms BenchmarkSumAggregation.test false 32 thrpt 50 143.183 ± 0.945 ops/ms BenchmarkSumAggregation.test false 64 thrpt 50 142.501 ± 1.180 ops/ms BenchmarkSumAggregation.test false 128 thrpt 50 143.732 ± 0.802 ops/ms BenchmarkSumAggregation.test true 1 thrpt 50 171031.995 ± 972.930 ops/ms BenchmarkSumAggregation.test true 2 thrpt 50 133.537 ± 1.683 ops/ms BenchmarkSumAggregation.test true 4 thrpt 50 6.216 ± 0.035 ops/ms BenchmarkSumAggregation.test true 8 thrpt 50 7.334 ± 0.033 ops/ms BenchmarkSumAggregation.test true 16 thrpt 50 8.308 ± 0.037 ops/ms BenchmarkSumAggregation.test true 32 thrpt 50 9.448 ± 0.121 ops/ms BenchmarkSumAggregation.test true 64 thrpt 50 11.076 ± 0.071 ops/ms BenchmarkSumAggregation.test true 128 thrpt 50 11.947 ± 0.069 ops/ms ``` #### New ``` Benchmark (_nullHandlingEnabled) (_nullPeriod) Mode Cnt Score Error Units BenchmarkSumAggregation.test false 1 thrpt 50 143.044 ± 1.463 ops/ms BenchmarkSumAggregation.test false 2 thrpt 50 141.605 ± 2.040 ops/ms BenchmarkSumAggregation.test false 4 thrpt 50 143.155 ± 1.499 ops/ms BenchmarkSumAggregation.test false 8 thrpt 50 141.408 ± 2.080 ops/ms BenchmarkSumAggregation.test false 16 thrpt 50 143.193 ± 0.794 ops/ms BenchmarkSumAggregation.test false 32 thrpt 50 143.326 ± 1.292 ops/ms BenchmarkSumAggregation.test false 64 thrpt 50 142.913 ± 1.425 ops/ms BenchmarkSumAggregation.test false 128 thrpt 50 144.853 ± 0.685 ops/ms BenchmarkSumAggregation.test true 1 thrpt 50 241239.579 ± 1136.547 ops/ms BenchmarkSumAggregation.test true 2 thrpt 50 71.465 ± 0.671 ops/ms BenchmarkSumAggregation.test true 4 thrpt 50 122.977 ± 0.769 ops/ms BenchmarkSumAggregation.test true 8 thrpt 50 197.443 ± 1.333 ops/ms BenchmarkSumAggregation.test true 16 thrpt 50 278.517 ± 1.447 ops/ms BenchmarkSumAggregation.test true 32 thrpt 50 242.543 ± 0.869 ops/ms BenchmarkSumAggregation.test true 64 thrpt 50 256.617 ± 2.179 ops/ms BenchmarkSumAggregation.test true 128 thrpt 50 207.354 ± 0.678 ops/ms ``` <hr> ### AvgAggregationFunction #### Old ``` Benchmark (_nullHandlingEnabled) (_nullPeriod) Mode Cnt Score Error Units BenchmarkAvgAggregation.test false 1 thrpt 50 144.106 ± 1.299 ops/ms BenchmarkAvgAggregation.test false 2 thrpt 50 145.306 ± 1.391 ops/ms BenchmarkAvgAggregation.test false 4 thrpt 50 141.760 ± 3.740 ops/ms BenchmarkAvgAggregation.test false 8 thrpt 50 145.969 ± 0.806 ops/ms BenchmarkAvgAggregation.test false 16 thrpt 50 146.584 ± 0.963 ops/ms BenchmarkAvgAggregation.test false 32 thrpt 50 143.416 ± 2.946 ops/ms BenchmarkAvgAggregation.test false 64 thrpt 50 146.223 ± 0.848 ops/ms BenchmarkAvgAggregation.test false 128 thrpt 50 146.637 ± 0.898 ops/ms BenchmarkAvgAggregation.test true 1 thrpt 50 273607.474 ± 822.817 ops/ms BenchmarkAvgAggregation.test true 2 thrpt 50 169.814 ± 2.570 ops/ms BenchmarkAvgAggregation.test true 4 thrpt 50 6.433 ± 0.046 ops/ms BenchmarkAvgAggregation.test true 8 thrpt 50 7.488 ± 0.056 ops/ms BenchmarkAvgAggregation.test true 16 thrpt 50 8.484 ± 0.064 ops/ms BenchmarkAvgAggregation.test true 32 thrpt 50 9.675 ± 0.076 ops/ms BenchmarkAvgAggregation.test true 64 thrpt 50 11.081 ± 0.084 ops/ms BenchmarkAvgAggregation.test true 128 thrpt 50 13.104 ± 0.084 ops/ms ``` #### New ``` Benchmark (_nullHandlingEnabled) (_nullPeriod) Mode Cnt Score Error Units BenchmarkAvgAggregation.test false 1 thrpt 50 138.875 ± 1.609 ops/ms BenchmarkAvgAggregation.test false 2 thrpt 50 139.859 ± 2.670 ops/ms BenchmarkAvgAggregation.test false 4 thrpt 50 142.453 ± 1.805 ops/ms BenchmarkAvgAggregation.test false 8 thrpt 50 143.638 ± 1.807 ops/ms BenchmarkAvgAggregation.test false 16 thrpt 50 140.297 ± 2.983 ops/ms BenchmarkAvgAggregation.test false 32 thrpt 50 143.873 ± 1.352 ops/ms BenchmarkAvgAggregation.test false 64 thrpt 50 144.444 ± 1.037 ops/ms BenchmarkAvgAggregation.test false 128 thrpt 50 141.255 ± 3.213 ops/ms BenchmarkAvgAggregation.test true 1 thrpt 50 274400.836 ± 1372.961 ops/ms BenchmarkAvgAggregation.test true 2 thrpt 50 80.724 ± 0.885 ops/ms BenchmarkAvgAggregation.test true 4 thrpt 50 136.979 ± 1.178 ops/ms BenchmarkAvgAggregation.test true 8 thrpt 50 158.269 ± 0.765 ops/ms BenchmarkAvgAggregation.test true 16 thrpt 50 147.872 ± 0.701 ops/ms BenchmarkAvgAggregation.test true 32 thrpt 50 143.951 ± 0.807 ops/ms BenchmarkAvgAggregation.test true 64 thrpt 50 141.555 ± 1.113 ops/ms BenchmarkAvgAggregation.test true 128 thrpt 50 141.092 ± 2.150 ops/ms ``` <hr> ### MinAggregationFunction #### Old ``` Benchmark (_nullHandlingEnabled) (_nullPeriod) Mode Cnt Score Error Units BenchmarkMinAggregation.test false 1 thrpt 50 198.780 ± 1.772 ops/ms BenchmarkMinAggregation.test false 2 thrpt 50 199.408 ± 1.591 ops/ms BenchmarkMinAggregation.test false 4 thrpt 50 201.402 ± 0.932 ops/ms BenchmarkMinAggregation.test false 8 thrpt 50 199.532 ± 1.327 ops/ms BenchmarkMinAggregation.test false 16 thrpt 50 197.042 ± 3.784 ops/ms BenchmarkMinAggregation.test false 32 thrpt 50 201.312 ± 1.005 ops/ms BenchmarkMinAggregation.test false 64 thrpt 50 199.076 ± 1.789 ops/ms BenchmarkMinAggregation.test false 128 thrpt 50 197.485 ± 3.533 ops/ms BenchmarkMinAggregation.test true 1 thrpt 50 233503.544 ± 757.227 ops/ms BenchmarkMinAggregation.test true 2 thrpt 50 135.514 ± 0.548 ops/ms BenchmarkMinAggregation.test true 4 thrpt 50 6.263 ± 0.033 ops/ms BenchmarkMinAggregation.test true 8 thrpt 50 7.291 ± 0.059 ops/ms BenchmarkMinAggregation.test true 16 thrpt 50 8.268 ± 0.045 ops/ms BenchmarkMinAggregation.test true 32 thrpt 50 9.917 ± 0.072 ops/ms BenchmarkMinAggregation.test true 64 thrpt 50 10.773 ± 0.069 ops/ms BenchmarkMinAggregation.test true 128 thrpt 50 11.961 ± 0.077 ops/ms ``` #### New ``` Benchmark (_nullHandlingEnabled) (_nullPeriod) Mode Cnt Score Error Units BenchmarkMinAggregation.test false 1 thrpt 50 200.789 ± 1.174 ops/ms BenchmarkMinAggregation.test false 2 thrpt 50 198.568 ± 2.002 ops/ms BenchmarkMinAggregation.test false 4 thrpt 50 201.473 ± 0.975 ops/ms BenchmarkMinAggregation.test false 8 thrpt 50 199.809 ± 1.575 ops/ms BenchmarkMinAggregation.test false 16 thrpt 50 195.557 ± 3.168 ops/ms BenchmarkMinAggregation.test false 32 thrpt 50 201.504 ± 0.909 ops/ms BenchmarkMinAggregation.test false 64 thrpt 50 201.442 ± 0.955 ops/ms BenchmarkMinAggregation.test false 128 thrpt 50 198.635 ± 3.376 ops/ms BenchmarkMinAggregation.test true 1 thrpt 50 233460.938 ± 878.133 ops/ms BenchmarkMinAggregation.test true 2 thrpt 50 74.088 ± 0.454 ops/ms BenchmarkMinAggregation.test true 4 thrpt 50 126.207 ± 1.104 ops/ms BenchmarkMinAggregation.test true 8 thrpt 50 205.234 ± 4.209 ops/ms BenchmarkMinAggregation.test true 16 thrpt 50 340.715 ± 5.285 ops/ms BenchmarkMinAggregation.test true 32 thrpt 50 351.642 ± 1.903 ops/ms BenchmarkMinAggregation.test true 64 thrpt 50 354.432 ± 3.337 ops/ms BenchmarkMinAggregation.test true 128 thrpt 50 349.553 ± 2.757 ops/ms ``` <hr> ### DistinctCountAggregationFunction #### Old ``` Benchmark (_nullHandlingEnabled) (_nullPeriod) Mode Cnt Score Error Units BenchmarkDistinctCountAggregation.test false 1 thrpt 50 19.555 ± 0.246 ops/ms BenchmarkDistinctCountAggregation.test false 2 thrpt 50 19.581 ± 0.164 ops/ms BenchmarkDistinctCountAggregation.test false 4 thrpt 50 19.643 ± 0.138 ops/ms BenchmarkDistinctCountAggregation.test false 8 thrpt 50 19.329 ± 0.261 ops/ms BenchmarkDistinctCountAggregation.test false 16 thrpt 50 19.752 ± 0.138 ops/ms BenchmarkDistinctCountAggregation.test false 32 thrpt 50 19.751 ± 0.092 ops/ms BenchmarkDistinctCountAggregation.test false 64 thrpt 50 19.419 ± 0.263 ops/ms BenchmarkDistinctCountAggregation.test false 128 thrpt 50 19.629 ± 0.111 ops/ms BenchmarkDistinctCountAggregation.test true 1 thrpt 50 57.300 ± 0.276 ops/ms BenchmarkDistinctCountAggregation.test true 2 thrpt 50 24.248 ± 0.256 ops/ms BenchmarkDistinctCountAggregation.test true 4 thrpt 50 4.427 ± 0.069 ops/ms BenchmarkDistinctCountAggregation.test true 8 thrpt 50 5.066 ± 0.045 ops/ms BenchmarkDistinctCountAggregation.test true 16 thrpt 50 5.481 ± 0.077 ops/ms BenchmarkDistinctCountAggregation.test true 32 thrpt 50 5.927 ± 0.055 ops/ms BenchmarkDistinctCountAggregation.test true 64 thrpt 50 6.325 ± 0.070 ops/ms BenchmarkDistinctCountAggregation.test true 128 thrpt 50 6.616 ± 0.032 ops/ms ``` #### New ``` Benchmark (_nullHandlingEnabled) (_nullPeriod) Mode Cnt Score Error Units BenchmarkDistinctCountAggregation.test false 1 thrpt 50 19.596 ± 0.242 ops/ms BenchmarkDistinctCountAggregation.test false 2 thrpt 50 19.610 ± 0.196 ops/ms BenchmarkDistinctCountAggregation.test false 4 thrpt 50 19.731 ± 0.152 ops/ms BenchmarkDistinctCountAggregation.test false 8 thrpt 50 19.362 ± 0.150 ops/ms BenchmarkDistinctCountAggregation.test false 16 thrpt 50 19.769 ± 0.135 ops/ms BenchmarkDistinctCountAggregation.test false 32 thrpt 50 19.280 ± 0.113 ops/ms BenchmarkDistinctCountAggregation.test false 64 thrpt 50 20.812 ± 0.249 ops/ms BenchmarkDistinctCountAggregation.test false 128 thrpt 50 19.783 ± 0.142 ops/ms BenchmarkDistinctCountAggregation.test true 1 thrpt 50 32847.118 ± 534.264 ops/ms BenchmarkDistinctCountAggregation.test true 2 thrpt 50 26.943 ± 0.196 ops/ms BenchmarkDistinctCountAggregation.test true 4 thrpt 50 20.160 ± 0.128 ops/ms BenchmarkDistinctCountAggregation.test true 8 thrpt 50 19.321 ± 0.242 ops/ms BenchmarkDistinctCountAggregation.test true 16 thrpt 50 20.170 ± 0.390 ops/ms BenchmarkDistinctCountAggregation.test true 32 thrpt 50 19.501 ± 0.089 ops/ms BenchmarkDistinctCountAggregation.test true 64 thrpt 50 20.306 ± 0.303 ops/ms BenchmarkDistinctCountAggregation.test true 128 thrpt 50 19.463 ± 0.092 ops/ms ``` <hr> ### VarianceAggregationFunction #### Old ``` Benchmark (_nullHandlingEnabled) (_nullPeriod) Mode Cnt Score Error Units BenchmarkVarianceAggregation.test false 1 thrpt 50 127.548 ± 0.816 ops/ms BenchmarkVarianceAggregation.test false 2 thrpt 50 126.447 ± 1.044 ops/ms BenchmarkVarianceAggregation.test false 4 thrpt 50 125.703 ± 0.447 ops/ms BenchmarkVarianceAggregation.test false 8 thrpt 50 125.822 ± 0.612 ops/ms BenchmarkVarianceAggregation.test false 16 thrpt 50 125.850 ± 0.550 ops/ms BenchmarkVarianceAggregation.test false 32 thrpt 50 125.811 ± 1.134 ops/ms BenchmarkVarianceAggregation.test false 64 thrpt 50 125.293 ± 1.243 ops/ms BenchmarkVarianceAggregation.test false 128 thrpt 50 124.949 ± 1.062 ops/ms BenchmarkVarianceAggregation.test true 1 thrpt 50 70.489 ± 0.259 ops/ms BenchmarkVarianceAggregation.test true 2 thrpt 50 132.470 ± 0.637 ops/ms BenchmarkVarianceAggregation.test true 4 thrpt 50 5.306 ± 0.037 ops/ms BenchmarkVarianceAggregation.test true 8 thrpt 50 6.325 ± 0.038 ops/ms BenchmarkVarianceAggregation.test true 16 thrpt 50 7.656 ± 0.065 ops/ms BenchmarkVarianceAggregation.test true 32 thrpt 50 9.575 ± 0.103 ops/ms BenchmarkVarianceAggregation.test true 64 thrpt 50 11.163 ± 0.090 ops/ms BenchmarkVarianceAggregation.test true 128 thrpt 50 12.187 ± 0.127 ops/ms ``` #### New ``` Benchmark (_nullHandlingEnabled) (_nullPeriod) Mode Cnt Score Error Units BenchmarkVarianceAggregation.test false 1 thrpt 50 126.163 ± 0.453 ops/ms BenchmarkVarianceAggregation.test false 2 thrpt 50 125.970 ± 0.412 ops/ms BenchmarkVarianceAggregation.test false 4 thrpt 50 125.252 ± 0.663 ops/ms BenchmarkVarianceAggregation.test false 8 thrpt 50 121.283 ± 0.382 ops/ms BenchmarkVarianceAggregation.test false 16 thrpt 50 123.634 ± 1.386 ops/ms BenchmarkVarianceAggregation.test false 32 thrpt 50 125.439 ± 0.738 ops/ms BenchmarkVarianceAggregation.test false 64 thrpt 50 125.730 ± 0.520 ops/ms BenchmarkVarianceAggregation.test false 128 thrpt 50 122.904 ± 2.077 ops/ms BenchmarkVarianceAggregation.test true 1 thrpt 50 245923.814 ± 1944.703 ops/ms BenchmarkVarianceAggregation.test true 2 thrpt 50 66.258 ± 0.474 ops/ms BenchmarkVarianceAggregation.test true 4 thrpt 50 101.951 ± 1.929 ops/ms BenchmarkVarianceAggregation.test true 8 thrpt 50 124.706 ± 1.027 ops/ms BenchmarkVarianceAggregation.test true 16 thrpt 50 137.289 ± 1.032 ops/ms BenchmarkVarianceAggregation.test true 32 thrpt 50 131.492 ± 1.169 ops/ms BenchmarkVarianceAggregation.test true 64 thrpt 50 128.387 ± 0.372 ops/ms BenchmarkVarianceAggregation.test true 128 thrpt 50 126.453 ± 1.039 ops/ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org