mqliang edited a comment on issue #6720: URL: https://github.com/apache/pinot/issues/6720#issuecomment-994027811
With "return a value extracted from the data table to ensure the code is not optimized away by the JIT", code: https://github.com/mqliang/pinot/commit/a32a61aad5dfa6b6c4a09064c75926b00495cd3a#diff-ea557c4916a39b7358c4acd1d5f3e6c5677bf454fec3e2a3db9df65230931702 ``` Benchmark Mode Cnt Score Error Units BenchmarkDataTableBulkBuild.BenchmarkDataTableRowBulkBuild avgt 5 176.261 ± 14.817 us/op BenchmarkDataTableBulkBuild.BenchmarkDataTableRowIdColIdBuildInOrder avgt 5 179.805 ± 11.924 us/op BenchmarkDataTableBulkBuild.BenchmarkDataTableRowIdColIdBuildRandomOrder avgt 5 244.589 ± 10.396 us/op ``` From the benchmark result, the function of `ByteBuffer.position()` itself is very lightweight. So as long as caller of DataTableBuilder write all values one by one, there is no big difference. However if caller write values in a random order, the performance will decrease ---- the overhead is caused by "write buffer/cache line disruption", not by the `ByteBuffer.position()` function itself IMO. I checked our code, all the caller write values one by one (columnId is monotonically increasing by 1 during building a DataTable), there is no use case of filling a DataTable in random order or replacing a value in DataTable with a given columnId. So here is my suggestion: replacing all the `public void setColumn(int colId, Type value)` functions as `public void append(Type value)`. The only challenge now is: for String type value, we use dictionary encoding and now DataTableV3 dictionary is per-column wise -- each column has a String->Int dictionary. So to append a String value, we must know the columnId (to lookup the dictionary for that column). This issue can be resolved after https://github.com/apache/pinot/pull/7167, where all columns share a common dictionary to encoding Strinng values. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org