kosiew opened a new pull request, #21315:
URL: https://github.com/apache/datafusion/pull/21315

   ## Which issue does this PR close?
   
   * Closes #21150.
   
   ---
   
   ## Rationale for this change
   
   The current implementation of MIN/MAX aggregates does not properly support 
dictionary-encoded arrays. In many cases, dictionary values are implicitly 
unpacked or coerced into their underlying value types, which leads to loss of 
type fidelity and bypasses optimized execution paths.
   
   This change ensures that dictionary-encoded data is handled natively 
throughout both planning and execution. By preserving dictionary types and 
operating directly on dictionary scalars, we avoid unnecessary flattening and 
enable more efficient aggregation behavior.
   
   ---
   
   ## What changes are included in this PR?
   
   * Added dictionary-aware handling in MIN/MAX scalar comparison logic
   
     * Supports comparisons between dictionary scalars and mixed scalar types
     * Introduces helper utilities to wrap/unwrap dictionary scalar values
   
   * Implemented `dictionary_batch_extreme` for computing min/max directly on 
dictionary arrays
   
     * Avoids extracting underlying value arrays
     * Properly skips nulls and unreferenced dictionary values
   
   * Updated `min_batch` and `max_batch` to use dictionary-aware logic
   
   * Simplified generic min/max batch logic
   
     * Improved null handling
     * Avoids redundant comparisons
   
   * Modified type inference (`get_min_max_result_type`)
   
     * Preserves dictionary types instead of coercing to value types
   
   * Added extensive test coverage:
   
     * Dictionary min/max without coercion
     * Handling nulls and optional values
     * Ignoring unreferenced dictionary entries
     * Multi-batch aggregation scenarios
     * Physical plan validation to ensure correct execution path
   
   ---
   
   ## Are these changes tested?
   
   Yes.
   
   This PR includes comprehensive unit and integration tests covering:
   
   * Correct min/max results for dictionary-encoded arrays
   * Behavior with null and partially null inputs
   * Multi-batch aggregation correctness
   * Preservation of dictionary types in output schema
   * Validation that the query planner uses the intended aggregate execution 
path
   
   These tests ensure both correctness and regression protection for future 
changes.
   
   ---
   
   ## Are there any user-facing changes?
   
   Yes.
   
   * MIN/MAX now fully support dictionary-encoded columns without coercion
   * Output types preserve dictionary encoding instead of returning underlying 
value types
   
   This improves performance and ensures more consistent type behavior for 
users working with dictionary data.
   
   No breaking API changes are introduced.
   
   ---
   
   ## LLM-generated code disclosure
   
   This PR includes LLM-generated code and comments. All LLM-generated content 
has been manually reviewed and tested.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to