xiangfu0 commented on PR #14698: URL: https://github.com/apache/pinot/pull/14698#issuecomment-4152130096
## Update: HashMap → Object2ObjectOpenHashMap + merge path optimization ### Changes in latest commit (`fd1d46ab89`) **1. Replace `HashMap` with fastutil `Object2ObjectOpenHashMap` in `SimpleIndexedTable`** - Open-addressing provides better L1/L2 cache locality (contiguous array probing vs pointer-chasing through `Map.Entry` linked lists) - Eliminates per-entry `Map.Entry` object allocations → lower GC pressure - All partitioned local tables benefit since they use `SimpleIndexedTable` **2. Override `merge()` in `IndexedTable` to reuse existing Keys** - The base `merge()` calls `upsert(Record)` which does `Arrays.copyOf` + `new Key` + `Arrays.hashCode` per record - New override iterates the source map's entry set directly, passing existing `Key` objects to `upsert(Key, Record)` - Saves one array allocation + one Key allocation + one hash computation per record during `publishLocalPartitions` **3. Fix `GroupByUtils.getTrimDisabledIndexedTable` ordering** - `numThreads == 1` check now runs before the `DeterministicConcurrentIndexedTable` check - Partitioned local tables (always `numThreads=1`) now correctly get `SimpleIndexedTable` (open-addressing HashMap) instead of potentially getting `DeterministicConcurrentIndexedTable` (backed by `ConcurrentSkipListMap`) ### Benchmark: Incremental gains from this commit (PARTITIONED only) | Unique Groups | Before this commit (ms) | After this commit (ms) | Improvement | |---:|---:|---:|---:| | 500 | 3.4 | **2.8** | **18%** | | 5,000 | 7.7 | **5.9** | **24%** | | 50,000 | 28.9 | **23.0** | **20%** | | 500,000 | 74.5 | **76.9** | ~flat (GC/memory-bandwidth bound) | ### Cumulative improvement across all optimization commits | Unique Groups | Original PR (ms) | Final (ms) | Total Speedup | |---:|---:|---:|---:| | 500 | 6.6 | **2.8** | **2.4x** | | 5,000 | 12.7 | **5.9** | **2.2x** | | 50,000 | 45.7 | **23.0** | **2.0x** | | 500,000 | 129.1 | **76.9** | **1.7x** | ### vs DEFAULT algorithm (end-to-end comparison) | Unique Groups | DEFAULT (ms) | PARTITIONED final (ms) | Speedup | |---:|---:|---:|---:| | 500 | 127.6 | **2.8** | **45.6x** | | 5,000 | 132.4 | **5.9** | **22.4x** | | 50,000 | 144.0 | **23.0** | **6.3x** | | 500,000 | 184.8 | **76.9** | **2.4x** | Setup: JMH, 1 fork, 3×5s iterations, 32 segments, 10K records/segment, 8 threads, `GROUP BY d1, d2 ORDER BY SUM(m1) DESC LIMIT 500` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
