Jefffrey commented on code in PR #22244:
URL: https://github.com/apache/datafusion/pull/22244#discussion_r3308734901


##########
datafusion/sqllogictest/test_files/information_schema.slt:
##########
@@ -862,15 +862,6 @@ datafusion public string_agg 1 IN expression String NULL 
false 1
 datafusion public string_agg 2 IN delimiter String NULL false 1
 datafusion public string_agg 1 OUT NULL String NULL false 1
 
-# test variable length arguments
-query TTTBI rowsort
-select specific_name, data_type, parameter_mode, is_variadic, rid from 
information_schema.parameters where specific_name = 'concat';

Review Comment:
   nit: can we keep this test case but replace it with a different function 
that is variadic?



##########
datafusion/functions/src/string/concat.rs:
##########
@@ -67,27 +68,19 @@ impl Default for ConcatFunc {
 
 impl ConcatFunc {
     pub fn new() -> Self {
-        use DataType::*;
         Self {
-            signature: Signature::variadic(
-                vec![Utf8View, Utf8, LargeUtf8, Binary],
-                Volatility::Immutable,
-            ),
+            // Use `Signature::UserDefined` to allow different argument types.
+            // `Variadic` requires every argument to be coerced to the same 
string type,
+            // so the UDF cannot distinguish between binary and string inputs.
+            signature: Signature::user_defined(Volatility::Immutable),
         }
     }
 }
 
-fn deduce_return_type(arg_types: &[DataType]) -> DataType {
-    use DataType::*;
-    if arg_types.contains(&Utf8View) {
-        Utf8View
-    } else if arg_types.contains(&LargeUtf8) {
-        LargeUtf8
-    } else {
-        Utf8
-    }
-}
-
+// Logic is matched with pipe operator in the following table.
+// Support only string + string concatenation,
+// or binary + binary concatenation.
+// Mixed string + binary concatenation is rejected,

Review Comment:
   Actually it would still be a breaking change; before #20787 I think it would 
still coerce binary to string. So this restriction would be new, even without 
#20787



##########
datafusion/sqllogictest/test_files/spark/string/concat.slt:
##########
@@ -71,67 +72,22 @@ SELECT concat('a', arrow_cast('b', 'LargeUtf8'), 
arrow_cast('c', 'Utf8View')), a
 ----
 abc Utf8View
 
-# Test mixed types: Utf8 + Binary
-query TT
-SELECT concat(arrow_cast('hello', 'Utf8'), arrow_cast(' world', 'Binary')), 
arrow_typeof(concat(arrow_cast('hello', 'Utf8'), arrow_cast(' world', 
'Binary')));
-----
-hello world Utf8
+# Coercion rules from Binary to Utf8 do no apply compared to generic `concat`,
+# so `concat` produces an explicit error
+query error Error during planning: concat does not support mixed string and 
binary inputs
+SELECT concat(arrow_cast('hello', 'Utf8'), arrow_cast(' world', 'Binary'));

Review Comment:
   I'm still a little confused here since apparently this should not error?
   
   I think it might be better not to do any changes to Spark concat in this PR 
to try cut down on scope a little



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to