Dandandan opened a new pull request, #22009:
URL: https://github.com/apache/datafusion/pull/22009

   ## Which issue does this PR close?
   
   - Closes #.
   
   ## Rationale for this change
   
   ClickBench q17 performs aggregation with a LIMIT over grouping keys. Without 
applying the limit during partial aggregation, memory grows with all discovered 
groups even though only the smallest keys can survive the final global limit.
   
   ## What changes are included in this PR?
   
   - Extends the limited distinct aggregation optimizer rule to unordered 
grouped aggregates with aggregate expressions.
   - Adds local group-key top-k pruning in `GroupedHashAggregateStream` for 
aggregate limits.
   - Routes unordered aggregate limits through the hash aggregate path and 
keeps existing ordered top-k handling separate.
   - Updates optimizer and ClickBench sqllogictest coverage.
   - Makes the dfbench and imdb benchmark binary allocator cfgs tolerate 
all-features by preferring mimalloc when both allocator features are enabled.
   
   ## Are these changes tested?
   
   Tested with:
   
   - `cargo fmt --all`
   - `cargo check -p datafusion-physical-plan -p datafusion-physical-optimizer`
   - `cargo test -p datafusion --test core_integration 
limited_distinct_aggregation -- --nocapture`
   - `cargo run -p datafusion-benchmarks --bin dfbench -- clickbench -q 17 -i 1 
-n 4 --path datafusion/core/tests/data/clickbench_hits_10.parquet --debug`
   - `cargo test -p datafusion-sqllogictest --test sqllogictests clickbench.slt`
   
   Known pre-existing check issue left untouched per request:
   
   - `cargo clippy --all-targets --all-features -- -D warnings` fails in 
`benchmarks/benches/sql.rs` because that bench still enables both snmalloc and 
mimalloc global allocators under all-features.
   
   ## Are there any user-facing changes?
   
   No API changes. Query plans for unordered grouped aggregate LIMITs can now 
push the limit into partial aggregation, reducing memory use for queries such 
as ClickBench q17.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to