mqliang edited a comment on issue #6720:
URL: https://github.com/apache/pinot/issues/6720#issuecomment-994027811


   With "return a value extracted from the data table to ensure the code is not 
optimized away by the JIT", code: 
https://github.com/mqliang/pinot/commit/a32a61aad5dfa6b6c4a09064c75926b00495cd3a#diff-ea557c4916a39b7358c4acd1d5f3e6c5677bf454fec3e2a3db9df65230931702
   
   ```
   Benchmark                                                                 
Mode  Cnt    Score    Error  Units
   BenchmarkDataTableBulkBuild.BenchmarkDataTableRowBulkBuild                
avgt    5  176.261 ± 14.817  us/op
   BenchmarkDataTableBulkBuild.BenchmarkDataTableRowIdColIdBuildInOrder      
avgt    5  179.805 ± 11.924  us/op
   BenchmarkDataTableBulkBuild.BenchmarkDataTableRowIdColIdBuildRandomOrder  
avgt    5  244.589 ± 10.396  us/op
   ```
   
   From the benchmark result, the function of `ByteBuffer.position()` itself is 
very lightweight. So as long as caller of DataTableBuilder write all values one 
by one, there is no big difference. However if caller write values in a random 
order, the performance will decrease ---- the overhead is caused by "write 
buffer/cache line disruption", not by the `ByteBuffer.position()` function 
itself IMO.
   
   I checked our code, all the caller write values one by one (columnId is 
monotonically increasing by 1 during building a DataTable), there is no use 
case of filling a DataTable in random order or replacing a value in DataTable 
with a given columnId. So here is my suggestion: replacing all the `public void 
setColumn(int colId, Type value)` functions as `public void append(Type value)`.
   
   The only challenge now is: for String type value, we use dictionary encoding 
and now DataTableV3 dictionary is per-column wise -- each column has a 
String->Int dictionary. So to append a String value, we must know the columnId 
(to lookup the dictionary for that column). This issue can be resolved after 
https://github.com/apache/pinot/pull/7167, where all columns share a common 
dictionary to encoding Strinng values.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to