andygrove opened a new issue, #4049:
URL: https://github.com/apache/datafusion-comet/issues/4049

   ## Describe the bug
   
   With the `sql_hive-1` Spark SQL CI job re-enabled for Spark 4.0 (see #2946 / 
#4047), five tests in `org.apache.spark.sql.hive.HiveUDFDynamicLoadSuite` fail:
   
   - `Spark should be able to run Hive UDF using jar regardless of current 
thread context classloader (UDF)`
   - `Spark should be able to run Hive UDF using jar regardless of current 
thread context classloader (GENERIC_UDF)`
   - `Spark should be able to run Hive UDF using jar regardless of current 
thread context classloader (GENERIC_UDAF)`
   - `Spark should be able to run Hive UDF using jar regardless of current 
thread context classloader (UDAF)`
   - `Spark should be able to run Hive UDF using jar regardless of current 
thread context classloader (GENERIC_UDTF)`
   
   All five fail with:
   
   ```
   org.apache.spark.sql.AnalysisException: [CANNOT_LOAD_FUNCTION_CLASS] Cannot 
load class
     org.apache.hadoop.hive.contrib.udf.example.UDFExampleAdd2 when registering 
the function
     `spark_catalog`.`default`.`udf_add2`, please make sure it is on the 
classpath. SQLSTATE: 46103
   ```
   
   The same tests pass on Spark 3.4 and 3.5 under the same CI matrix. Only the 
Spark 4.0 sql_hive-1 job is affected.
   
   The tests dynamically register Hive UDFs that live in `hive-contrib`; on 
Spark 4.0 that jar (or the class inside it) isn't resolvable on the session 
classpath of the Hive test harness.
   
   ## Reproduction
   
   Run the `sql_hive-1` job for Spark 4.0.1 (equivalent to `build/sbt 
-Pspark-4.0 hive/testOnly * -- -l org.apache.spark.tags.ExtendedHiveTest -l 
org.apache.spark.tags.SlowHiveTest` from a prepared apache-spark checkout, as 
wired up in `.github/workflows/spark_sql_test.yml`).
   
   Observed on CI run 
https://github.com/apache/datafusion-comet/actions/runs/24837941861/job/72704298074
 (PR #4047).
   
   ## Expected behavior
   
   All five HiveUDFDynamicLoadSuite variants should pass, matching the behavior 
on Spark 3.4/3.5.
   
   ## Additional context
   
   These failures are unrelated to #2946 (hang/timeout, fixed in #4048). They 
remain as a blocker to re-enabling the `sql_hive-1` job for Spark 4.0. Likely 
candidates: `hive-contrib` classpath/packaging differences between Spark 3.x 
and 4.0 test harnesses, or a classloader isolation change in Spark 4.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to