Dandandan opened a new pull request, #21595:
URL: https://github.com/apache/datafusion/pull/21595

   ## Which issue does this PR close?
   
   Related to improving GROUP BY performance on large datasets (e.g., 
ClickBench).
   
   ## Rationale for this change
   
   The `AggregateExec(Partial)` accumulates ALL input rows into a single hash 
table before emitting. For high-cardinality GROUP BY queries (e.g., `GROUP BY 
URL` with 15M distinct values), the hash table grows far beyond CPU cache 
capacity, causing poor cache performance during hash table lookups.
   
   ## What changes are included in this PR?
   
   **Two-generation early emission:** When the partial aggregate's hash table 
exceeds a configurable size threshold (default: 4MB), it uses a generational 
scheme:
   
   1. **Hot table fills up** → emit the **cold batch** (previous generation) 
downstream
   2. **Promote** current hot table state → cold batch  
   3. **Reset** hot hash table and continue reading input
   
   This gives recurring groups a second chance to be merged locally before 
being sent downstream. At end-of-input, the remaining hot state and cold batch 
are concatenated and emitted together.
   
   **Benefits:**
   - Hash table stays small (~4MB) → fits in L3 cache → fewer cache misses
   - Two generations means groups appearing across batch windows get merged 
locally
   - Low-cardinality GROUP BY: threshold never exceeded → no change in behavior
   - High-cardinality GROUP BY: hash table stays cache-friendly, still 
aggregates (unlike `skip_partial_aggregation` which stops aggregating entirely)
   
   **New config:** `datafusion.execution.partial_aggregation_max_table_size` 
(default: 4MB, 0 to disable)
   
   ## Are these changes tested?
   
   All existing aggregate SLT tests pass with the default threshold of 4MB 
enabled. The feature is designed to be transparent — correctness is maintained 
because `FinalPartitioned` correctly merges multiple partial emissions.
   
   ## Are there any user-facing changes?
   
   New configuration option 
`datafusion.execution.partial_aggregation_max_table_size`. The default (4MB) is 
enabled automatically. Set to 0 to disable.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to