alamb opened a new issue, #23081:
URL: https://github.com/apache/datafusion/issues/23081

   ### Is your feature request related to a problem or challenge?
   
   `GroupsAccumulator` currently has two related APIs:
   
   - `convert_to_state(...)`, which converts raw input rows directly into 
intermediate aggregate state
   - `supports_convert_to_state()`, which lets hash aggregate decide whether 
the skip-partial-aggregation optimization is available
   
   This optional capability adds branching and complexity in both hash 
aggregate implementations:
   
   - `datafusion/physical-plan/src/aggregates/row_hash.rs`
   - `datafusion/physical-plan/src/aggregates/aggregate_hash_table/*`
   
   It also means some `GroupsAccumulator` implementations may silently miss the 
high-cardinality fast path.
   
   ### Describe the solution you'd like
   
   Make `convert_to_state` a required/high-performance part of the 
`GroupsAccumulator` contract, remove `supports_convert_to_state`, and simplify 
hash aggregate so skip-partial aggregation no longer needs to check 
per-accumulator support.
   
   Suggested scope:
   
   - Audit all `GroupsAccumulator` implementations and ensure each has a 
correct `convert_to_state`
   - Remove the default `not_impl` implementation of `convert_to_state` from 
`datafusion/expr-common/src/groups_accumulator.rs`
   - Remove `GroupsAccumulator::supports_convert_to_state`
   - Remove skip-aggregation support checks from hash aggregate code
   - Update FFI groups accumulator support, currently mirrored via 
`FFI_GroupsAccumulator::supports_convert_to_state`
   - Update comments/docs that describe `convert_to_state` as optional
   - Update tests such as `aggregate_skip_partial.slt` so skip-partial coverage 
expects all grouped accumulators to support the path
   
   Acceptance criteria:
   
   - No `supports_convert_to_state` API remains
   - All in-tree `GroupsAccumulator` implementations compile with an explicit 
`convert_to_state`
   - Skip-partial aggregation no longer disables itself due to missing 
accumulator support
   - FFI API changes are documented and tested
   - Upgrading guide explains the required change for external 
`GroupsAccumulator` implementors
   
   ### Describe alternatives you've considered
   
   We could keep `supports_convert_to_state` and continue enabling skip-partial 
aggregation only when all accumulators opt in. That preserves compatibility, 
but it keeps the hash aggregate code more complex and allows some accumulators 
to miss the high-cardinality fast path indefinitely.
   
   Another option is to rely on `GroupsAccumulatorAdapter` as a generic 
fallback for all missing implementations. This is correct, but it is row-by-row 
and may not be high performance. The desired end state should be that 
native/vectorized group accumulators implement efficient conversions where 
possible, while any fallback remains explicit.
   
   ### Additional context
   
   This is likely a breaking public API change for custom aggregate authors and 
FFI users, so it should be marked as an API change and documented in the 
upgrading guide.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to