[GitHub] [pinot] mqliang edited a comment on issue #6720: Benchmark setColumn() in DataTableBuilder and write all values one by one instead of using rowId and colId to position if need be

GitBox Tue, 23 Nov 2021 14:31:08 -0800


mqliang edited a comment on issue #6720:
URL: https://github.com/apache/pinot/issues/6720#issuecomment-808974129



   I write a bench mark here: 
https://github.com/mqliang/pinot/commit/a32a61aad5dfa6b6c4a09064c75926b00495cd3a
   
   The benchmark compares three ways to build a data table:
   * `BenchmarkDataTableRowIdColIdBuildInOrder`: for each row, call 
`dataTableBuilder.setColumn(colId, value);` to set the value for each column. 
However, set value of columns in order of: 1st col, 2nd col, 3nd col...
   * `BenchmarkDataTableRowIdColIdBuildRandomOrder`: call 
`dataTableBuilder.setColumn(colId, value);` to set the value for each column. 
However, set value of columns in ramdom order, e.g. in order of : 11st col, 
20st col, 3nd col, 5st col...
   * `BenchmarkDataTableRowBulkBuild`: for each row, first put values of all 
column into a `Object[]` array, then build the row in bulk -- write all values 
one by one, without calling `ByteBuffer.position()` at all.
   
   The result of building a table of 100 rows is:
   ```
   # JMH version: 1.26
   # VM version: JDK 1.8.0_282, OpenJDK 64-Bit Server VM, 25.282-b08
   # VM invoker: 
/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java
   # VM options: -javaagent:/Users/mqliang/Library/Application 
Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ 
IDEA.app/Contents/lib/idea_rt.jar=56968:/Users/mqliang/Library/Application 
Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ 
IDEA.app/Contents/bin -Dfile.encoding=UTF-8
   # Warmup: 1 iterations, 10 s each
   # Measurement: 5 iterations, 30 s each
   # Timeout: 10 min per iteration
   # Threads: 1 thread, will synchronize iterations
   # Benchmark mode: Average time, time/op
   # Benchmark: 
org.apache.pinot.perf.BenchmarkDataTableBulkBuild.BenchmarkDataTableRowBulkBuild
   
   # Run progress: 0.00% complete, ETA 00:08:00
   # Fork: 1 of 1
   # Warmup Iteration   1: 198.463 us/op
   Iteration   1: 157.345 us/op
   Iteration   2: 157.778 us/op
   Iteration   3: 155.212 us/op
   Iteration   4: 154.870 us/op
   Iteration   5: 153.947 us/op
   
   
   Result 
"org.apache.pinot.perf.BenchmarkDataTableBulkBuild.BenchmarkDataTableRowBulkBuild":
     155.830 ±(99.9%) 6.368 us/op [Average]
     (min, avg, max) = (153.947, 155.830, 157.778), stdev = 1.654
     CI (99.9%): [149.462, 162.198] (assumes normal distribution)
   
   
   # JMH version: 1.26
   # VM version: JDK 1.8.0_282, OpenJDK 64-Bit Server VM, 25.282-b08
   # VM invoker: 
/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java
   # VM options: -javaagent:/Users/mqliang/Library/Application 
Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ 
IDEA.app/Contents/lib/idea_rt.jar=56968:/Users/mqliang/Library/Application 
Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ 
IDEA.app/Contents/bin -Dfile.encoding=UTF-8
   # Warmup: 1 iterations, 10 s each
   # Measurement: 5 iterations, 30 s each
   # Timeout: 10 min per iteration
   # Threads: 1 thread, will synchronize iterations
   # Benchmark mode: Average time, time/op
   # Benchmark: 
org.apache.pinot.perf.BenchmarkDataTableBulkBuild.BenchmarkDataTableRowIdColIdBuildInOrder
   
   # Run progress: 33.33% complete, ETA 00:05:21
   # Fork: 1 of 1
   # Warmup Iteration   1: 193.779 us/op
   Iteration   1: 150.726 us/op
   Iteration   2: 150.649 us/op
   Iteration   3: 151.587 us/op
   Iteration   4: 151.765 us/op
   Iteration   5: 151.749 us/op
   
   
   Result 
"org.apache.pinot.perf.BenchmarkDataTableBulkBuild.BenchmarkDataTableRowIdColIdBuildInOrder":
     151.295 ±(99.9%) 2.155 us/op [Average]
     (min, avg, max) = (150.649, 151.295, 151.765), stdev = 0.560
     CI (99.9%): [149.140, 153.451] (assumes normal distribution)
   
   
   # JMH version: 1.26
   # VM version: JDK 1.8.0_282, OpenJDK 64-Bit Server VM, 25.282-b08
   # VM invoker: 
/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java
   # VM options: -javaagent:/Users/mqliang/Library/Application 
Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ 
IDEA.app/Contents/lib/idea_rt.jar=56968:/Users/mqliang/Library/Application 
Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ 
IDEA.app/Contents/bin -Dfile.encoding=UTF-8
   # Warmup: 1 iterations, 10 s each
   # Measurement: 5 iterations, 30 s each
   # Timeout: 10 min per iteration
   # Threads: 1 thread, will synchronize iterations
   # Benchmark mode: Average time, time/op
   # Benchmark: 
org.apache.pinot.perf.BenchmarkDataTableBulkBuild.BenchmarkDataTableRowIdColIdBuildRandomOrder
   
   # Run progress: 66.67% complete, ETA 00:02:40
   # Fork: 1 of 1
   # Warmup Iteration   1: 216.635 us/op
   Iteration   1: 175.108 us/op
   Iteration   2: 174.428 us/op
   Iteration   3: 178.706 us/op
   Iteration   4: 180.284 us/op
   Iteration   5: 178.219 us/op
   
   
   Result 
"org.apache.pinot.perf.BenchmarkDataTableBulkBuild.BenchmarkDataTableRowIdColIdBuildRandomOrder":
     177.349 ±(99.9%) 9.581 us/op [Average]
     (min, avg, max) = (174.428, 177.349, 180.284), stdev = 2.488
     CI (99.9%): [167.768, 186.930] (assumes normal distribution)
   
   
   # Run complete. Total time: 00:08:02
   
   REMEMBER: The numbers below are just data. To gain reusable insights, you 
need to follow up on
   why the numbers are the way they are. Use profilers (see -prof, -lprof), 
design factorial
   experiments, perform baseline and negative tests that provide experimental 
control, make sure
   the benchmarking environment is safe on JVM/OS/HW level, ask for reviews 
from the domain experts.
   Do not assume the numbers tell you what you want them to tell.
   
   Benchmark                                                                 
Mode  Cnt    Score   Error  Units
   BenchmarkDataTableBulkBuild.BenchmarkDataTableRowBulkBuild                
avgt    5  155.830 ± 6.368  us/op
   BenchmarkDataTableBulkBuild.BenchmarkDataTableRowIdColIdBuildInOrder      
avgt    5  151.295 ± 2.155  us/op
   BenchmarkDataTableBulkBuild.BenchmarkDataTableRowIdColIdBuildRandomOrder  
avgt    5  177.349 ± 9.581  us/op
   
   Process finished with exit code 0
   ``` 
   
   The result shows there is not significant difference between  
BenchmarkDataTableRowBulkBuild and BenchmarkDataTableRowIdColIdBuildInOrder. 
Which means: even if we use `setColumn(colId, value)` to build datatable, as 
long as we set values for columns in increasing order, not randomly, the 
overhead of calling `ByteBuffer.position()` is negligible.
   
   Currently, our code base set values for columns in increasing order, so 
there is no need to address the TODO from the point of view of improving 
performance. But from the code cleaning point of view, we we can provide such a 
`setColumnValuesInBulk()` method, and change all current
   datatable building code to use  `setColumnValuesInBulk()`. This way, our 
code is more self-explainable -- setting column value in bulk (in increasing 
order) is better than setting in random order. However, all `setColumn(colId, 
value)` methods should be kept, since there may be some circumstance we need to 
set a value for an arbitrary row/col.
   
   cc @Jackie-Jiang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [pinot] mqliang edited a comment on issue #6720: Benchmark setColumn() in DataTableBuilder and write all values one by one instead of using rowId and colId to position if need be

Reply via email to