Jackie-Jiang opened a new pull request #6559: URL: https://github.com/apache/incubator-pinot/pull/6559
## Description Optimize the `IntMapBasedHolder` in the `DictionaryBasedGroupKeyGenerator` for better group-by performance on query with group-by columns cardinality product from 10K to 2B. The improvement (up to ~40%) is mainly from replacing the `Int2IntOpenHashMap` with the new implemented `IntGroupIdMap`. The `IntGroupIdMap` stores both keys and values within a single array so that it is more friendly to CPU cache. The benchmark result for these 2 map implementations are as followings (with 20 threads): ``` Benchmark (_cardinality) Mode Cnt Score Error Units BenchmarkIntOpenHashMap.intGroupIdMap 10000 avgt 5 177.478 ± 13.114 ms/op BenchmarkIntOpenHashMap.intGroupIdMap 20000 avgt 5 192.885 ± 14.417 ms/op BenchmarkIntOpenHashMap.intGroupIdMap 50000 avgt 5 274.969 ± 20.314 ms/op BenchmarkIntOpenHashMap.intGroupIdMap 100000 avgt 5 568.092 ± 1.432 ms/op BenchmarkIntOpenHashMap.intGroupIdMap 150000 avgt 5 720.392 ± 5.983 ms/op BenchmarkIntOpenHashMap.intOpenHashMap 10000 avgt 5 170.883 ± 5.290 ms/op BenchmarkIntOpenHashMap.intOpenHashMap 20000 avgt 5 229.085 ± 12.062 ms/op BenchmarkIntOpenHashMap.intOpenHashMap 50000 avgt 5 377.269 ± 5.762 ms/op BenchmarkIntOpenHashMap.intOpenHashMap 100000 avgt 5 1001.601 ± 34.179 ms/op BenchmarkIntOpenHashMap.intOpenHashMap 150000 avgt 5 984.365 ± 17.488 ms/op ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org