JohnTortugo opened a new pull request #6287:
URL: https://github.com/apache/incubator-pinot/pull/6287


   I'm from a Microsoft team working with LinkedIn engineers to improve Pinot 
performance. In a previous talk with @mayankshriv, he said that segment 
creation is slow and if we could take a look to improve that. CPU profiling, 
using synthetic data, for the segment creation code, produced the following 
flame graph:
   
   ![CPU 
Hotspots](https://cesarshare.blob.core.windows.net/pinot-investigation/SegmentCreation-Flames.png)
 [Full resolution 
image](https://cesarshare.blob.core.windows.net/pinot-investigation/SegmentCreation-Flames.png)
   
   Basically, segment creation is made of two roughly equal parts: init & 
build. Both methods are very similar in the sense that they have a main loop 
that iterates over rows in the input data doing some transformation on each of 
them. This PR introduces two main changes:
   
   1) A new Pinot-perf benchmark used for benchmarking segment creation 
performance.
   
   2) The parallelization of the init->gatherStats and build methods mentioned 
above. The main loop of each method was parallelized using a technique called 
[DSWP](https://liberty.princeton.edu/Publications/micro05_dswp.pdf) and we made 
use of [Disruptor RingBuffer](https://github.com/LMAX-Exchange/disruptor) to 
implement thread communication.
   
   ## Benchmark results:
   
   ### Original code
   
   ```
   # Run progress: 0.00% complete, ETA 00:07:00
   # Fork: 1 of 2
   # Warmup Iteration   1: 31097.686 ms/op
   # Warmup Iteration   2: 26007.428 ms/op
   # Warmup Iteration   3: 26816.007 ms/op
   Iteration   1: 25951.170 ms/op
   Iteration   2: 26076.096 ms/op
   Iteration   3: 26045.939 ms/op
   
   # Run progress: 50.00% complete, ETA 00:05:19
   # Fork: 2 of 2
   # Warmup Iteration   1: 31711.546 ms/op
   # Warmup Iteration   2: 26587.875 ms/op
   # Warmup Iteration   3: 27360.283 ms/op
   Iteration   1: 26208.574 ms/op
   Iteration   2: 26316.409 ms/op
   Iteration   3: 26194.492 ms/op
   
   
   Result 
"org.apache.pinot.perf.BenchmarkSegmentCreation.segmentCreationFromCSV":
     26132.113 ±(99.9%) 369.912 ms/op [Average]
     (min, avg, max) = (25951.170, 26132.113, 26316.409), stdev = 131.914
     CI (99.9%): [25762.202, 26502.025] (assumes normal distribution)
   
   
   # Run complete. Total time: 00:10:42
   
   Benchmark                                     Mode  Cnt      Score     Error 
 Units
   BenchmarkSegmentCreation.segmentCreationFromCSV  avgt    6  26132.113 ± 
369.912  ms/op
   ```
   
   ### New code
   
   ```
   # Run progress: 0.00% complete, ETA 00:03:20
   # Fork: 1 of 2
   # Warmup Iteration   1: 23004.364 ms/op
   # Warmup Iteration   2: 19380.296 ms/op
   # Warmup Iteration   3: 20914.349 ms/op
   # Warmup Iteration   4: 19469.886 ms/op
   # Warmup Iteration   5: 19461.024 ms/op
   Iteration   1: 19523.648 ms/op
   Iteration   2: 19582.673 ms/op
   Iteration   3: 19409.540 ms/op
   Iteration   4: 19419.701 ms/op
   Iteration   5: 19386.130 ms/op
   
   # Run progress: 50.00% complete, ETA 00:03:20
   # Fork: 2 of 2
   # Warmup Iteration   1: 23344.723 ms/op
   # Warmup Iteration   2: 19335.702 ms/op
   # Warmup Iteration   3: 20535.619 ms/op
   # Warmup Iteration   4: 19512.260 ms/op
   # Warmup Iteration   5: 19461.238 ms/op
   Iteration   1: 19510.350 ms/op
   Iteration   2: 19453.281 ms/op
   Iteration   3: 19444.863 ms/op
   Iteration   4: 19399.972 ms/op
   Iteration   5: 19380.142 ms/op
   
   
   Result 
"org.apache.pinot.perf.BenchmarkSegmentCreation.segmentCreationFromCSV":
     19451.030 ±(99.9%) 101.684 ms/op [Average]
     (min, avg, max) = (19380.142, 19451.030, 19582.673), stdev = 67.258
     CI (99.9%): [19349.346, 19552.714] (assumes normal distribution)
   
   
   # Run complete. Total time: 00:06:41
   
   Benchmark                                        Mode  Cnt      Score     
Error  Units
   BenchmarkSegmentCreation.segmentCreationFromCSV  avgt   10  19451.030 ± 
101.684  ms/op
   ```
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to