kosiew commented on code in PR #21151:
URL: https://github.com/apache/datafusion/pull/21151#discussion_r3025716158


##########
datafusion/functions-aggregate-common/src/min_max.rs:
##########
@@ -413,6 +413,31 @@ macro_rules! min_max {
                 min_max_generic!(lhs, rhs, $OP)
             }
 
+            // Dictionary scalars: compare the inner values and re-wrap.

Review Comment:
   The dictionary unwrap and rewrap logic is now spread across the `min_max!` 
macro and both `min_batch` and `max_batch`.
   
   It might be worth pulling this into a small helper, something like "compare 
dictionary scalars by inner value while preserving the key type". That would 
keep the behavior in one place and make future tweaks to null handling or type 
checks a bit easier.



##########
datafusion/functions-aggregate/src/min_max.rs:
##########
@@ -1270,4 +1270,104 @@ mod tests {
         assert_eq!(max_result, ScalarValue::Utf8(Some("🦀".to_string())));
         Ok(())
     }
+
+    fn dict_scalar(key_type: DataType, inner: ScalarValue) -> ScalarValue {
+        ScalarValue::Dictionary(Box::new(key_type), Box::new(inner))
+    }
+
+    #[test]
+    fn test_min_max_dictionary_without_coercion() -> Result<()> {
+        let values = StringArray::from(vec!["b", "c", "a", "d"]);
+        let keys = Int32Array::from(vec![Some(0), Some(1), Some(2), Some(3)]);
+        let dict_array =
+            DictionaryArray::try_new(keys, Arc::new(values) as 
ArrayRef).unwrap();
+        let dict_array_ref = Arc::new(dict_array) as ArrayRef;
+
+        // Pass the raw Dictionary type — no get_min_max_result_type() unwrap.

Review Comment:
   Nice addition of coverage here. One thing I noticed though is that the new 
tests only exercise `MinAccumulator::try_new(&dict_type)` and 
`MaxAccumulator::try_new(&dict_type)` directly.
   
   In the normal MIN/MAX UDAF path, we still go through 
`get_min_max_result_type`, which unwraps `Dictionary<K, V>` to `V` before 
building the return field. Because of that, the new dictionary scalar branches 
in functions-aggregate-common are still bypassed during planned SQL execution.
   
   So at the moment this feels a bit incomplete relative to the PR goal. Would 
it make sense to either align the UDAF and coercion path with the new 
dictionary state behavior, or add an end to end test that shows which public 
execution path now relies on this logic?



##########
datafusion/functions-aggregate/src/min_max.rs:
##########
@@ -1270,4 +1270,104 @@ mod tests {
         assert_eq!(max_result, ScalarValue::Utf8(Some("🦀".to_string())));
         Ok(())
     }
+
+    fn dict_scalar(key_type: DataType, inner: ScalarValue) -> ScalarValue {

Review Comment:
   The three new tests look good, but they repeat the same dictionary array 
setup pattern.
   
   A small helper that builds a `DictionaryArray` and `ArrayRef` pair could 
make these tests easier to read and keep future dictionary min/max cases 
cheaper to add.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to