xiangfu0 commented on PR #14698:
URL: https://github.com/apache/pinot/pull/14698#issuecomment-4152342951

   ## Update: Parallel partition finish() for high-cardinality group-by
   
   ### Change in latest commit (`17ce1e1b33`)
   
   **Parallelize `finish()` across partitions** — For high-cardinality ORDER BY 
queries, each partition's `finish()` does O(N/P × log K) heap-based top-K 
selection. Previously these ran sequentially (O(N × log K) total). Now all 
partition `finish()` calls are submitted to the thread pool in parallel, 
reducing wall-clock to O(N/P × log K).
   
   ### High-Cardinality Benchmark (100K records/segment, 32 segments, 8 threads)
   
   | Unique Groups | DEFAULT (ms) | NON-BLOCKING (ms) | PARTITIONED (ms) | 
PARTITIONED vs DEFAULT |
   |---:|---:|---:|---:|---:|
   | 500K | 1,380 | 691 | **638** | **2.2x faster** |
   | 5M | 1,934 | 104,058 | **637** | **3.0x faster** |
   
   **Key finding at 5M groups:**
   - NON-BLOCKING completely collapses at 5M groups (104 seconds!) — unbounded 
contention on a single shared `ConcurrentHashMap` with 3.2M records across 8 
threads causes massive lock contention
   - DEFAULT takes ~2 seconds — the `ConcurrentIndexedTable` read-write lock 
adds overhead but survives
   - PARTITIONED handles it in **637ms** — **163x faster than NON-BLOCKING**, 
**3x faster than DEFAULT**
   
   The partitioned approach scales gracefully because:
   1. Thread-local partition tables eliminate all cross-thread contention 
during processing
   2. Parallel `finish()` distributes the expensive heap selection across cores
   3. `CompositePartitionTable` avoids the O(N) merge phase entirely


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to