gortiz commented on PR #12354:
URL: https://github.com/apache/pinot/pull/12354#issuecomment-1964199109

   Extra investigations:
   ## Why performance in `normal` with `null handling` decreases so much when 
going from 2 to 4 interval?
   
   I've checked with perfnorm:
   ```
   Benchmark                                                 (_impl)  
(_nullHandlingEnabled)  (_nullInterval)   Mode  Cnt        Score   Error      
Units
   BenchmarkSumAggregation2.test                              normal            
        true                2  thrpt    5      103.386 ± 3.852     ops/ms
   BenchmarkSumAggregation2.test:CPI                          normal            
        true                2  thrpt             0.177          clks/insn
   BenchmarkSumAggregation2.test:IPC                          normal            
        true                2  thrpt             5.660          insns/clk
   BenchmarkSumAggregation2.test:L1-dcache-load-misses        normal            
        true                2  thrpt          1321.767               #/op
   BenchmarkSumAggregation2.test:L1-dcache-loads              normal            
        true                2  thrpt         15314.726               #/op
   BenchmarkSumAggregation2.test:L1-icache-load-misses        normal            
        true                2  thrpt             1.106               #/op
   BenchmarkSumAggregation2.test:L1-icache-loads              normal            
        true                2  thrpt           329.697               #/op
   BenchmarkSumAggregation2.test:branch-misses                normal            
        true                2  thrpt             8.702               #/op
   BenchmarkSumAggregation2.test:branches                     normal            
        true                2  thrpt         60461.749               #/op
   BenchmarkSumAggregation2.test:cycles                       normal            
        true                2  thrpt         41090.010               #/op
   BenchmarkSumAggregation2.test:dTLB-load-misses             normal            
        true                2  thrpt             0.301               #/op
   BenchmarkSumAggregation2.test:dTLB-loads                   normal            
        true                2  thrpt             2.005               #/op
   BenchmarkSumAggregation2.test:iTLB-load-misses             normal            
        true                2  thrpt             0.048               #/op
   BenchmarkSumAggregation2.test:iTLB-loads                   normal            
        true                2  thrpt             0.219               #/op
   BenchmarkSumAggregation2.test:instructions                 normal            
        true                2  thrpt        232553.341               #/op
   BenchmarkSumAggregation2.test:stalled-cycles-backend       normal            
        true                2  thrpt         25474.941               #/op
   BenchmarkSumAggregation2.test:stalled-cycles-frontend      normal            
        true                2  thrpt            57.691               #/op
   BenchmarkSumAggregation2.test                              normal            
        true                4  thrpt    5        3.525 ± 0.129     ops/ms
   BenchmarkSumAggregation2.test:CPI                          normal            
        true                4  thrpt             0.378          clks/insn
   BenchmarkSumAggregation2.test:IPC                          normal            
        true                4  thrpt             2.645          insns/clk
   BenchmarkSumAggregation2.test:L1-dcache-load-misses        normal            
        true                4  thrpt          1689.903               #/op
   BenchmarkSumAggregation2.test:L1-dcache-loads              normal            
        true                4  thrpt        685807.174               #/op
   BenchmarkSumAggregation2.test:L1-icache-load-misses        normal            
        true                4  thrpt            21.785               #/op
   BenchmarkSumAggregation2.test:L1-icache-loads              normal            
        true                4  thrpt         10603.011               #/op
   BenchmarkSumAggregation2.test:branch-misses                normal            
        true                4  thrpt         15379.544               #/op
   BenchmarkSumAggregation2.test:branches                     normal            
        true                4  thrpt        817946.383               #/op
   BenchmarkSumAggregation2.test:cycles                       normal            
        true                4  thrpt       1223502.258               #/op
   BenchmarkSumAggregation2.test:dTLB-load-misses             normal            
        true                4  thrpt             2.123               #/op
   BenchmarkSumAggregation2.test:dTLB-loads                   normal            
        true                4  thrpt            16.638               #/op
   BenchmarkSumAggregation2.test:iTLB-load-misses             normal            
        true                4  thrpt             0.639               #/op
   BenchmarkSumAggregation2.test:iTLB-loads                   normal            
        true                4  thrpt             2.910               #/op
   BenchmarkSumAggregation2.test:instructions                 normal            
        true                4  thrpt       3236651.002               #/op
   BenchmarkSumAggregation2.test:stalled-cycles-backend       normal            
        true                4  thrpt        116767.982               #/op
   BenchmarkSumAggregation2.test:stalled-cycles-frontend      normal            
        true                4  thrpt         17092.450               #/op
   ```
   
   AFAIU there are far more cache failures and mispredictions.
   
   ## Why performance in `foldDouble` with `null handling` increases so much 
when going from 4 to 8 interval?
   
   Same reason:
   ```
   BenchmarkSumAggregation2.test                          foldDouble            
        true                4  thrpt    5       68.321 ± 0.344     ops/ms
   BenchmarkSumAggregation2.test:CPI                      foldDouble            
        true                4  thrpt             0.234          clks/insn
   BenchmarkSumAggregation2.test:IPC                      foldDouble            
        true                4  thrpt             4.264          insns/clk
   BenchmarkSumAggregation2.test:L1-dcache-load-misses    foldDouble            
        true                4  thrpt          2382.564               #/op
   BenchmarkSumAggregation2.test:L1-dcache-loads          foldDouble            
        true                4  thrpt        103018.730               #/op
   BenchmarkSumAggregation2.test:L1-icache-load-misses    foldDouble            
        true                4  thrpt             4.476               #/op
   BenchmarkSumAggregation2.test:L1-icache-loads          foldDouble            
        true                4  thrpt           402.647               #/op
   BenchmarkSumAggregation2.test:branch-misses            foldDouble            
        true                4  thrpt             5.758               #/op
   BenchmarkSumAggregation2.test:branches                 foldDouble            
        true                4  thrpt         41678.023               #/op
   BenchmarkSumAggregation2.test:cycles                   foldDouble            
        true                4  thrpt         64669.720               #/op
   BenchmarkSumAggregation2.test:dTLB-load-misses         foldDouble            
        true                4  thrpt            15.780               #/op
   BenchmarkSumAggregation2.test:dTLB-loads               foldDouble            
        true                4  thrpt            21.655               #/op
   BenchmarkSumAggregation2.test:iTLB-load-misses         foldDouble            
        true                4  thrpt             0.501               #/op
   BenchmarkSumAggregation2.test:iTLB-loads               foldDouble            
        true                4  thrpt             0.600               #/op
   BenchmarkSumAggregation2.test:instructions             foldDouble            
        true                4  thrpt        275779.218               #/op
   BenchmarkSumAggregation2.test:stalled-cycles-backend   foldDouble            
        true                4  thrpt         37349.094               #/op
   BenchmarkSumAggregation2.test:stalled-cycles-frontend  foldDouble            
        true                4  thrpt           700.245               #/op
   BenchmarkSumAggregation2.test                          foldDouble            
        true                8  thrpt    5      120.523 ± 1.062     ops/ms
   BenchmarkSumAggregation2.test:CPI                      foldDouble            
        true                8  thrpt             0.233          clks/insn
   BenchmarkSumAggregation2.test:IPC                      foldDouble            
        true                8  thrpt             4.293          insns/clk
   BenchmarkSumAggregation2.test:L1-dcache-load-misses    foldDouble            
        true                8  thrpt          1838.574               #/op
   BenchmarkSumAggregation2.test:L1-dcache-loads          foldDouble            
        true                8  thrpt         56420.620               #/op
   BenchmarkSumAggregation2.test:L1-icache-load-misses    foldDouble            
        true                8  thrpt             1.978               #/op
   BenchmarkSumAggregation2.test:L1-icache-loads          foldDouble            
        true                8  thrpt           217.917               #/op
   BenchmarkSumAggregation2.test:branch-misses            foldDouble            
        true                8  thrpt             2.651               #/op
   BenchmarkSumAggregation2.test:branches                 foldDouble            
        true                8  thrpt         21608.391               #/op
   BenchmarkSumAggregation2.test:cycles                   foldDouble            
        true                8  thrpt         34548.798               #/op
   BenchmarkSumAggregation2.test:dTLB-load-misses         foldDouble            
        true                8  thrpt             8.186               #/op
   BenchmarkSumAggregation2.test:dTLB-loads               foldDouble            
        true                8  thrpt            10.305               #/op
   BenchmarkSumAggregation2.test:iTLB-load-misses         foldDouble            
        true                8  thrpt             0.277               #/op
   BenchmarkSumAggregation2.test:iTLB-loads               foldDouble            
        true                8  thrpt             0.263               #/op
   BenchmarkSumAggregation2.test:instructions             foldDouble            
        true                8  thrpt        148330.222               #/op
   BenchmarkSumAggregation2.test:stalled-cycles-backend   foldDouble            
        true                8  thrpt         21706.154               #/op
   BenchmarkSumAggregation2.test:stalled-cycles-frontend  foldDouble            
        true                8  thrpt           302.209               #/op
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to