gortiz commented on PR #12354: URL: https://github.com/apache/pinot/pull/12354#issuecomment-1963827711
So the speedup in the new operations is explained because we are actually doing a better work in the new implementations by using primitive values. These implementations: - Are faster because they use the int/long value, which may be optimized by the compiler. - Provide correct results when the sum is not outside the original range (Integer.MAX_VALUE, Long.MAX_VALUE, etc). - Overflow when the partial sum is outside the original range (Integer.MAX_VALUE, Long.MAX_VALUE, etc). Remember that the current implementation also overflow if the value end up being higher than Double.MAX_VALUE, although that is very unlikely. There is an interesting topic here. Should we create these alternative implementations in order to provide correct results with extra speedup at the cost of possible overflow? Should we make them the default implementations for V2? But that is a discussion we can have later. The matter here is the null handling implementation. Therefore I would ignore `foldPrimitive` and `foldHolder`, which are changing semantics, and focus on `normal` vs `foldDouble`, which keeps semantics. As we can see in the results: - Comparing `normal` and `foldDouble` when `nullHandling` is disabled: It doesn't seem the introduced lambda adds a performance penalty. - Comparing `normal` and `foldDouble` when `nullHandling` is enabled: - We can also notice an increase in performance when we have at least 15 not null consecutive values TBH I don't know why we get more performance when null handling enabled with enough not null consecutive values. But assuming we use `foldNotNull` instead of the single row methods in `NullableSingleInputAggregationFunction`, I think we can assume performance won't be affected significantly. These are the values in my computer: ``` Benchmark (_impl) (_nullHandlingEnabled) (_nullInterval) Mode Cnt Score Error Units BenchmarkSumAggregation2.test normal true 2 thrpt 5 103.210 ± 9.108 ops/ms BenchmarkSumAggregation2.test normal true 4 thrpt 5 4.062 ± 0.026 ops/ms BenchmarkSumAggregation2.test normal true 8 thrpt 5 4.377 ± 0.045 ops/ms BenchmarkSumAggregation2.test normal true 16 thrpt 5 5.164 ± 0.068 ops/ms BenchmarkSumAggregation2.test normal true 32 thrpt 5 5.935 ± 0.296 ops/ms BenchmarkSumAggregation2.test normal true 64 thrpt 5 8.044 ± 0.335 ops/ms BenchmarkSumAggregation2.test normal true 128 thrpt 5 9.003 ± 0.143 ops/ms BenchmarkSumAggregation2.test normal false 2 thrpt 5 140.407 ± 0.573 ops/ms BenchmarkSumAggregation2.test normal false 4 thrpt 5 141.040 ± 0.451 ops/ms BenchmarkSumAggregation2.test normal false 8 thrpt 5 141.907 ± 0.992 ops/ms BenchmarkSumAggregation2.test normal false 16 thrpt 5 143.397 ± 1.548 ops/ms BenchmarkSumAggregation2.test normal false 32 thrpt 5 139.698 ± 1.044 ops/ms BenchmarkSumAggregation2.test normal false 64 thrpt 5 144.736 ± 1.227 ops/ms BenchmarkSumAggregation2.test normal false 128 thrpt 5 144.443 ± 1.678 ops/ms BenchmarkSumAggregation2.test foldDouble true 2 thrpt 5 34.237 ± 1.460 ops/ms BenchmarkSumAggregation2.test foldDouble true 4 thrpt 5 65.086 ± 3.262 ops/ms BenchmarkSumAggregation2.test foldDouble true 8 thrpt 5 120.121 ± 2.080 ops/ms BenchmarkSumAggregation2.test foldDouble true 16 thrpt 5 182.891 ± 2.179 ops/ms BenchmarkSumAggregation2.test foldDouble true 32 thrpt 5 191.954 ± 1.011 ops/ms BenchmarkSumAggregation2.test foldDouble true 64 thrpt 5 197.672 ± 2.340 ops/ms BenchmarkSumAggregation2.test foldDouble true 128 thrpt 5 187.932 ± 5.005 ops/ms BenchmarkSumAggregation2.test foldDouble false 2 thrpt 5 142.703 ± 3.093 ops/ms BenchmarkSumAggregation2.test foldDouble false 4 thrpt 5 139.740 ± 2.003 ops/ms BenchmarkSumAggregation2.test foldDouble false 8 thrpt 5 145.153 ± 3.140 ops/ms BenchmarkSumAggregation2.test foldDouble false 16 thrpt 5 143.858 ± 2.861 ops/ms BenchmarkSumAggregation2.test foldDouble false 32 thrpt 5 139.753 ± 1.024 ops/ms BenchmarkSumAggregation2.test foldDouble false 64 thrpt 5 142.077 ± 0.810 ops/ms BenchmarkSumAggregation2.test foldDouble false 128 thrpt 5 142.829 ± 1.676 ops/ms ``` For the record, these are the results when using the other two options (with foldHolder fixed): ``` BenchmarkSumAggregation2.test foldHolder true 2 thrpt 5 43.157 ± 1.498 ops/ms BenchmarkSumAggregation2.test foldHolder true 4 thrpt 5 68.783 ± 5.989 ops/ms BenchmarkSumAggregation2.test foldHolder true 8 thrpt 5 124.738 ± 3.362 ops/ms BenchmarkSumAggregation2.test foldHolder true 16 thrpt 5 216.012 ± 10.644 ops/ms BenchmarkSumAggregation2.test foldHolder true 32 thrpt 5 313.113 ± 21.122 ops/ms BenchmarkSumAggregation2.test foldHolder true 64 thrpt 5 392.115 ± 6.565 ops/ms BenchmarkSumAggregation2.test foldHolder true 128 thrpt 5 413.473 ± 19.703 ops/ms BenchmarkSumAggregation2.test foldHolder false 2 thrpt 5 426.726 ± 5.831 ops/ms BenchmarkSumAggregation2.test foldHolder false 4 thrpt 5 423.828 ± 8.957 ops/ms BenchmarkSumAggregation2.test foldHolder false 8 thrpt 5 415.301 ± 4.804 ops/ms BenchmarkSumAggregation2.test foldHolder false 16 thrpt 5 425.916 ± 5.353 ops/ms BenchmarkSumAggregation2.test foldHolder false 32 thrpt 5 419.126 ± 3.485 ops/ms BenchmarkSumAggregation2.test foldHolder false 64 thrpt 5 427.603 ± 3.293 ops/ms BenchmarkSumAggregation2.test foldHolder false 128 thrpt 5 413.514 ± 3.203 ops/ms BenchmarkSumAggregation2.test foldPrimitive true 2 thrpt 5 34.018 ± 0.853 ops/ms BenchmarkSumAggregation2.test foldPrimitive true 4 thrpt 5 63.797 ± 0.683 ops/ms BenchmarkSumAggregation2.test foldPrimitive true 8 thrpt 5 117.642 ± 2.266 ops/ms BenchmarkSumAggregation2.test foldPrimitive true 16 thrpt 5 158.315 ± 7.338 ops/ms BenchmarkSumAggregation2.test foldPrimitive true 32 thrpt 5 188.807 ± 35.955 ops/ms BenchmarkSumAggregation2.test foldPrimitive true 64 thrpt 5 222.271 ± 111.049 ops/ms BenchmarkSumAggregation2.test foldPrimitive true 128 thrpt 5 221.113 ± 110.499 ops/ms BenchmarkSumAggregation2.test foldPrimitive false 2 thrpt 5 416.386 ± 3.038 ops/ms BenchmarkSumAggregation2.test foldPrimitive false 4 thrpt 5 423.796 ± 3.550 ops/ms BenchmarkSumAggregation2.test foldPrimitive false 8 thrpt 5 414.014 ± 3.438 ops/ms BenchmarkSumAggregation2.test foldPrimitive false 16 thrpt 5 413.562 ± 3.760 ops/ms BenchmarkSumAggregation2.test foldPrimitive false 32 thrpt 5 420.902 ± 13.165 ops/ms BenchmarkSumAggregation2.test foldPrimitive false 64 thrpt 5 412.881 ± 17.209 ops/ms BenchmarkSumAggregation2.test foldPrimitive false 128 thrpt 5 426.167 ± 7.631 ops/ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org