(spark) branch master updated: [SPARK-55322][SQL][TESTS][FOLLOWUP] Fix `max_by and min_by with k` failure when ANSI mode is disabled

dongjoon Wed, 25 Feb 2026 09:19:29 -0800

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 54ff4ea05135 [SPARK-55322][SQL][TESTS][FOLLOWUP] Fix `max_by and 
min_by with k` failure when ANSI mode is disabled
54ff4ea05135 is described below

commit 54ff4ea05135e2366f271b5aaeaac45b4eba6481
Author: yangjie01 <[email protected]>
AuthorDate: Wed Feb 25 09:19:10 2026 -0800

    [SPARK-55322][SQL][TESTS][FOLLOWUP] Fix `max_by and min_by with k` failure 
when ANSI mode is disabled
    
    ### What changes were proposed in this pull request?
    This pr updates a test case in `DataFrameAggregateSuite` regarding `max_by` 
and `min_by` functions. Specifically, it refines the assertion logic for 
invalid `k` input (non-numeric string) to account for different behaviors 
depending on `spark.sql.ansi.enabled`.
    
    - **ANSI Enabled**: Expects `CAST_INVALID_INPUT` or "cannot be cast" error, 
as the string `'two'` cannot be cast to an integer.
    - **ANSI Disabled**: Expects `VALUE_OUT_OF_RANGE` error. In legacy mode, 
the invalid cast returns `0` (default for integer), which then triggers a 
validation error because `k` must be positive.
    
    ### Why are the changes needed?
    Restore daily testing in non-ANSI mode
    
    - https://github.com/apache/spark/actions/runs/22247813526/job/64365502163
    
    ```
    [info] - max_by and min_by with k *** FAILED *** (1 second, 431 
milliseconds)
    [info]   "[DATATYPE_MISMATCH.VALUE_OUT_OF_RANGE] Cannot resolve "max_by(x, 
y, two)" due to data type mismatch: The `k` must be between [1, 100000] 
(current value = 0). SQLSTATE: 42K09; line 1 pos 7;
    [info]   'Aggregate [unresolvedalias(max_by(x#628078, y#628079, cast(two as 
int), false, 0, 0))]
    [info]   +- SubqueryAlias tab
    [info]      +- LocalRelation [x#628078, y#628079]
    [info]   " did not contain "CAST_INVALID_INPUT", and 
"[DATATYPE_MISMATCH.VALUE_OUT_OF_RANGE] Cannot resolve "max_by(x, y, two)" due 
to data type mismatch: The `k` must be between [1, 100000] (current value = 0). 
SQLSTATE: 42K09; line 1 pos 7;
    [info]   'Aggregate [unresolvedalias(max_by(x#628078, y#628079, cast(two as 
int), false, 0, 0))]
    [info]   +- SubqueryAlias tab
    [info]      +- LocalRelation [x#628078, y#628079]
    [info]   " did not contain "cannot be cast" 
(DataFrameAggregateSuite.scala:1386)
    ...
    [info] *** 4 TESTS FAILED ***
    [error] Failed: Total 4096, Failed 4, Errors 0, Passed 4092, Ignored 13
    [error] Failed tests:
    [error]         org.apache.spark.sql.SingleLevelAggregateHashMapSuite
    [error]         org.apache.spark.sql.DataFrameAggregateSuite
    [error]         org.apache.spark.sql.TwoLevelAggregateHashMapSuite
    [error]         
org.apache.spark.sql.TwoLevelAggregateHashMapWithVectorizedMapSuite
    [error] (sql / Test / test) sbt.TestsFailedException: Tests unsuccessful
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    Manually verify by running the command `SPARK_ANSI_SQL_MODE=false build/sbt 
"sql/testOnly org.apache.spark.sql.SingleLevelAggregateHashMapSuite 
org.apache.spark.sql.DataFrameAggregateSuite 
org.apache.spark.sql.TwoLevelAggregateHashMapSuite 
org.apache.spark.sql.TwoLevelAggregateHashMapWithVectorizedMapSuite"`, and all 
tests pass successfully.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    No
    
    Closes #54484 from LuciferYang/SPARK-55322-FOLLOWUP.
    
    Authored-by: yangjie01 <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 .../test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
index 64b33ccb89a2..f606c6746f3c 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
@@ -1383,8 +1383,12 @@ class DataFrameAggregateSuite extends QueryTest
       val error = intercept[Exception] {
         sql(s"SELECT $fn(x, y, 'two') FROM VALUES (('a', 10)) AS tab(x, 
y)").collect()
       }
-      assert(error.getMessage.contains("CAST_INVALID_INPUT") ||
-        error.getMessage.contains("cannot be cast"))
+      if (conf.ansiEnabled) {
+        assert(error.getMessage.contains("CAST_INVALID_INPUT") ||
+          error.getMessage.contains("cannot be cast"))
+      } else {
+        assert(error.getMessage.contains("VALUE_OUT_OF_RANGE"))
+      }
     }
 
     // Error: k must be positive


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-55322][SQL][TESTS][FOLLOWUP] Fix `max_by and min_by with k` failure when ANSI mode is disabled

Reply via email to