gortiz commented on PR #12354:
URL: https://github.com/apache/pinot/pull/12354#issuecomment-1963827711

   So the speedup in the new operations is explained because we are actually 
doing a better work in the new implementations by using primitive values. These 
implementations:
   - Are faster because they use the int/long value, which may be optimized by 
the compiler.
   - Provide correct results when the sum is not outside the original range 
(Integer.MAX_VALUE, Long.MAX_VALUE, etc).
   - Overflow when the partial sum is outside the original range 
(Integer.MAX_VALUE, Long.MAX_VALUE, etc). Remember that the current 
implementation also overflow if the value end up being higher than 
Double.MAX_VALUE, although that is very unlikely.
   
   There is an interesting topic here. Should we create these alternative 
implementations in order to provide correct results with extra speedup at the 
cost of possible overflow? Should we make them the default implementations for 
V2?
   
   But that is a discussion we can have later. The matter here is the null 
handling implementation. Therefore I would ignore `foldPrimitive` and 
`foldHolder`, which are changing semantics, and focus on `normal` vs 
`foldDouble`, which keeps semantics. 
   
   As we can see in the results:
   - Comparing `normal` and `foldDouble` when `nullHandling` is disabled: It 
doesn't seem the introduced lambda adds a performance penalty.
   - Comparing `normal` and `foldDouble` when `nullHandling` is enabled: 
   - We can also notice an increase in performance when we have at least 15 not 
null consecutive values
   
   TBH I don't know why we get more performance when null handling enabled with 
enough not null consecutive values. But assuming we use `foldNotNull` instead 
of the single row methods in `NullableSingleInputAggregationFunction`, I think 
we can assume performance won't be affected significantly.
   
   These are the values in my computer:
   ```
   Benchmark                         (_impl)  (_nullHandlingEnabled)  
(_nullInterval)   Mode  Cnt    Score   Error   Units
   BenchmarkSumAggregation2.test      normal                    true            
    2  thrpt    5  103.210 ± 9.108  ops/ms
   BenchmarkSumAggregation2.test      normal                    true            
    4  thrpt    5    4.062 ± 0.026  ops/ms
   BenchmarkSumAggregation2.test      normal                    true            
    8  thrpt    5    4.377 ± 0.045  ops/ms
   BenchmarkSumAggregation2.test      normal                    true            
   16  thrpt    5    5.164 ± 0.068  ops/ms
   BenchmarkSumAggregation2.test      normal                    true            
   32  thrpt    5    5.935 ± 0.296  ops/ms
   BenchmarkSumAggregation2.test      normal                    true            
   64  thrpt    5    8.044 ± 0.335  ops/ms
   BenchmarkSumAggregation2.test      normal                    true            
  128  thrpt    5    9.003 ± 0.143  ops/ms
   BenchmarkSumAggregation2.test      normal                   false            
    2  thrpt    5  140.407 ± 0.573  ops/ms
   BenchmarkSumAggregation2.test      normal                   false            
    4  thrpt    5  141.040 ± 0.451  ops/ms
   BenchmarkSumAggregation2.test      normal                   false            
    8  thrpt    5  141.907 ± 0.992  ops/ms
   BenchmarkSumAggregation2.test      normal                   false            
   16  thrpt    5  143.397 ± 1.548  ops/ms
   BenchmarkSumAggregation2.test      normal                   false            
   32  thrpt    5  139.698 ± 1.044  ops/ms
   BenchmarkSumAggregation2.test      normal                   false            
   64  thrpt    5  144.736 ± 1.227  ops/ms
   BenchmarkSumAggregation2.test      normal                   false            
  128  thrpt    5  144.443 ± 1.678  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                    true            
    2  thrpt    5   34.237 ± 1.460  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                    true            
    4  thrpt    5   65.086 ± 3.262  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                    true            
    8  thrpt    5  120.121 ± 2.080  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                    true            
   16  thrpt    5  182.891 ± 2.179  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                    true            
   32  thrpt    5  191.954 ± 1.011  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                    true            
   64  thrpt    5  197.672 ± 2.340  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                    true            
  128  thrpt    5  187.932 ± 5.005  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                   false            
    2  thrpt    5  142.703 ± 3.093  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                   false            
    4  thrpt    5  139.740 ± 2.003  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                   false            
    8  thrpt    5  145.153 ± 3.140  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                   false            
   16  thrpt    5  143.858 ± 2.861  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                   false            
   32  thrpt    5  139.753 ± 1.024  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                   false            
   64  thrpt    5  142.077 ± 0.810  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                   false            
  128  thrpt    5  142.829 ± 1.676  ops/ms
   ```
   
   For the record, these are the results when using the other two options (with 
foldHolder fixed):
   
   ```
   BenchmarkSumAggregation2.test     foldHolder                    true         
       2  thrpt    5   43.157 ±   1.498  ops/ms
   BenchmarkSumAggregation2.test     foldHolder                    true         
       4  thrpt    5   68.783 ±   5.989  ops/ms
   BenchmarkSumAggregation2.test     foldHolder                    true         
       8  thrpt    5  124.738 ±   3.362  ops/ms
   BenchmarkSumAggregation2.test     foldHolder                    true         
      16  thrpt    5  216.012 ±  10.644  ops/ms
   BenchmarkSumAggregation2.test     foldHolder                    true         
      32  thrpt    5  313.113 ±  21.122  ops/ms
   BenchmarkSumAggregation2.test     foldHolder                    true         
      64  thrpt    5  392.115 ±   6.565  ops/ms
   BenchmarkSumAggregation2.test     foldHolder                    true         
     128  thrpt    5  413.473 ±  19.703  ops/ms
   BenchmarkSumAggregation2.test     foldHolder                   false         
       2  thrpt    5  426.726 ±   5.831  ops/ms
   BenchmarkSumAggregation2.test     foldHolder                   false         
       4  thrpt    5  423.828 ±   8.957  ops/ms
   BenchmarkSumAggregation2.test     foldHolder                   false         
       8  thrpt    5  415.301 ±   4.804  ops/ms
   BenchmarkSumAggregation2.test     foldHolder                   false         
      16  thrpt    5  425.916 ±   5.353  ops/ms
   BenchmarkSumAggregation2.test     foldHolder                   false         
      32  thrpt    5  419.126 ±   3.485  ops/ms
   BenchmarkSumAggregation2.test     foldHolder                   false         
      64  thrpt    5  427.603 ±   3.293  ops/ms
   BenchmarkSumAggregation2.test     foldHolder                   false         
     128  thrpt    5  413.514 ±   3.203  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                    true         
       2  thrpt    5   34.018 ±   0.853  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                    true         
       4  thrpt    5   63.797 ±   0.683  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                    true         
       8  thrpt    5  117.642 ±   2.266  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                    true         
      16  thrpt    5  158.315 ±   7.338  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                    true         
      32  thrpt    5  188.807 ±  35.955  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                    true         
      64  thrpt    5  222.271 ± 111.049  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                    true         
     128  thrpt    5  221.113 ± 110.499  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                   false         
       2  thrpt    5  416.386 ±   3.038  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                   false         
       4  thrpt    5  423.796 ±   3.550  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                   false         
       8  thrpt    5  414.014 ±   3.438  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                   false         
      16  thrpt    5  413.562 ±   3.760  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                   false         
      32  thrpt    5  420.902 ±  13.165  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                   false         
      64  thrpt    5  412.881 ±  17.209  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                   false         
     128  thrpt    5  426.167 ±   7.631  ops/ms
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to