andygrove opened a new issue, #4049:
URL: https://github.com/apache/datafusion-comet/issues/4049
## Describe the bug
With the `sql_hive-1` Spark SQL CI job re-enabled for Spark 4.0 (see #2946 /
#4047), five tests in `org.apache.spark.sql.hive.HiveUDFDynamicLoadSuite` fail:
- `Spark should be able to run Hive UDF using jar regardless of current
thread context classloader (UDF)`
- `Spark should be able to run Hive UDF using jar regardless of current
thread context classloader (GENERIC_UDF)`
- `Spark should be able to run Hive UDF using jar regardless of current
thread context classloader (GENERIC_UDAF)`
- `Spark should be able to run Hive UDF using jar regardless of current
thread context classloader (UDAF)`
- `Spark should be able to run Hive UDF using jar regardless of current
thread context classloader (GENERIC_UDTF)`
All five fail with:
```
org.apache.spark.sql.AnalysisException: [CANNOT_LOAD_FUNCTION_CLASS] Cannot
load class
org.apache.hadoop.hive.contrib.udf.example.UDFExampleAdd2 when registering
the function
`spark_catalog`.`default`.`udf_add2`, please make sure it is on the
classpath. SQLSTATE: 46103
```
The same tests pass on Spark 3.4 and 3.5 under the same CI matrix. Only the
Spark 4.0 sql_hive-1 job is affected.
The tests dynamically register Hive UDFs that live in `hive-contrib`; on
Spark 4.0 that jar (or the class inside it) isn't resolvable on the session
classpath of the Hive test harness.
## Reproduction
Run the `sql_hive-1` job for Spark 4.0.1 (equivalent to `build/sbt
-Pspark-4.0 hive/testOnly * -- -l org.apache.spark.tags.ExtendedHiveTest -l
org.apache.spark.tags.SlowHiveTest` from a prepared apache-spark checkout, as
wired up in `.github/workflows/spark_sql_test.yml`).
Observed on CI run
https://github.com/apache/datafusion-comet/actions/runs/24837941861/job/72704298074
(PR #4047).
## Expected behavior
All five HiveUDFDynamicLoadSuite variants should pass, matching the behavior
on Spark 3.4/3.5.
## Additional context
These failures are unrelated to #2946 (hang/timeout, fixed in #4048). They
remain as a blocker to re-enabling the `sql_hive-1` job for Spark 4.0. Likely
candidates: `hive-contrib` classpath/packaging differences between Spark 3.x
and 4.0 test harnesses, or a classloader isolation change in Spark 4.0.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]