This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new efd066e5394a [SPARK-54002][DEPLOY] Support integrating BeeLine with
Connect JDBC driver
efd066e5394a is described below
commit efd066e5394a49be2a9dd3ec83123b2aec26763e
Author: Cheng Pan <[email protected]>
AuthorDate: Wed Oct 29 12:57:28 2025 -0700
[SPARK-54002][DEPLOY] Support integrating BeeLine with Connect JDBC driver
### What changes were proposed in this pull request?
This PR modifies the classpath for `bin/beeline` - excluding
`spark-sql-core_*.jar`, `spark-connect_*.jar`, etc., and adding
`jars/connect-repl/*.jar`, making it as same as `bin/spark-connect-shell`, the
modified classpath looks like
```
jars/*.jar - except for spark-sql-core_*.jar, spark-connect_*.jar, etc.
jars/connect-repl/*.jar - including spark-connect-client-jdbc_*.jar
```
Note: BeeLine itself only requires Hive jars and a few third-party
utilities jars to run, excluding some `spark-*.jar`s won't break BeeLine's
existing capability for connecting to Thrift Server.
To ensure no change for classic Spark behavior, for Spark classic(default)
distribution, the above changes only take effect when setting
`SPARK_CONNECT_BEELINE=1` explicitly. For convenience, this is enabled by
default for the Spark connect distribution
### Why are the changes needed?
It's a new feature, with this feature, users are allowed to use BeeLine as
an SQL CLI to connect to the Spark Connect server.
### Does this PR introduce _any_ user-facing change?
No, this feature must be enabled by setting `SPARK_CONNECT_BEELINE=1`
explicitly for classic(default) Spark distribution.
### How was this patch tested?
Launch a Connect Server first, in my case, the Connect Server
(v4.1.0-preview2) runs at `sc://localhost:15002`. To ensure changes won't break
the Thrift Server use case, also launch a Thrift Server at
`thrift://localhost:10000`
#### Testing for dev mode
Building
```
$ build/sbt -Phive,hive-thriftserver clean package
```
Without setting `SPARK_CONNECT_BEELINE=1`, it fails as expected with `No
known driver to handle "jdbc:sc://localhost:15002"`
```
$ SPARK_PREPEND_CLASSES=true bin/beeline -u jdbc:sc://localhost:15002
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes
ahead of assembly.
WARNING: Using incubator modules: jdk.incubator.vector
scan complete in 0ms
scan complete in 4ms
No known driver to handle "jdbc:sc://localhost:15002"
Beeline version 2.3.10 by Apache Hive
beeline>
```
With setting `SPARK_CONNECT_BEELINE=1`, it works as expected
```
$ SPARK_PREPEND_CLASSES=true SPARK_CONNECT_BEELINE=1 bin/beeline -u
jdbc:sc://localhost:15002
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes
ahead of assembly.
WARNING: Using incubator modules: jdk.incubator.vector
Connecting to jdbc:sc://localhost:15002
Connected to: Apache Spark Connect Server (version 4.1.0-preview2)
Driver: Apache Spark Connect JDBC Driver (version 4.1.0-SNAPSHOT)
Error: Requested transaction isolation level REPEATABLE_READ is not
supported (state=,code=0)
Beeline version 2.3.10 by Apache Hive
0: jdbc:sc://localhost:15002> select 'Hello, Spark Connect!', version() as
server_version;
+------------------------+-------------------------------------------------+
| Hello, Spark Connect! | server_version |
+------------------------+-------------------------------------------------+
| Hello, Spark Connect! | 4.1.0 c5ff48cc2b2c5a1526789ae414ff4c63b053d3ec |
+------------------------+-------------------------------------------------+
1 row selected (0.476 seconds)
0: jdbc:sc://localhost:15002>
```
Also, test with Thrift Server to ensure no impact on existing
functionalities.
It works as expected both with and without `SPARK_CONNECT_BEELINE=1`
```
$ SPARK_PREPEND_CLASSES=true [SPARK_CONNECT_BEELINE=1] bin/beeline -u
jdbc:hive2://localhost:10000
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes
ahead of assembly.
WARNING: Using incubator modules: jdk.incubator.vector
Connecting to jdbc:hive2://localhost:10000
Connected to: Spark SQL (version 4.1.0-preview2)
Driver: Hive JDBC (version 2.3.10)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.3.10 by Apache Hive
0: jdbc:hive2://localhost:10000> select 'Hello, Spark Connect!', version()
as server_version;
+------------------------+-------------------------------------------------+
| Hello, Spark Connect! | server_version |
+------------------------+-------------------------------------------------+
| Hello, Spark Connect! | 4.1.0 c5ff48cc2b2c5a1526789ae414ff4c63b053d3ec |
+------------------------+-------------------------------------------------+
1 row selected (0.973 seconds)
0: jdbc:hive2://localhost:10000>
```
#### Testing for Spark distribution
```
$ dev/make-distribution.sh --tgz --connect --name SPARK-54002 -Pyarn
-Pkubernetes -Phadoop-3 -Phive -Phive-thriftserver
```
##### Spark classic distribution
```
$ tar -xzf spark-4.1.0-SNAPSHOT-bin-SPARK-54002.tgz
$ cd spark-4.1.0-SNAPSHOT-bin-SPARK-54002
$ bin/beeline -u jdbc:sc://localhost:15002 -e "select 'Hello, Spark
Connect!', version() as server_version;"
... (negative result, fails with 'No known driver to handle
"jdbc:sc://localhost:15002"')
$ SPARK_CONNECT_BEELINE=1 bin/beeline -u jdbc:sc://localhost:15002 -e
"select 'Hello, Spark Connect!', version() as server_version;"
... (positive result)
$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select 'Hello, Spark
Connect!', version() as server_version;"
... (positive result)
$ SPARK_CONNECT_BEELINE=1 bin/beeline -u jdbc:hive2://localhost:10000 -e
"select 'Hello, Spark Connect!', version() as server_version;"
... (positive result)
```
##### Spark connect distribution
```
$ tar -xzf spark-4.1.0-SNAPSHOT-bin-SPARK-54002-connect.tgz
$ cd spark-4.1.0-SNAPSHOT-bin-SPARK-54002-connect
$ bin/beeline -u jdbc:sc://localhost:15002 -e "select 'Hello, Spark
Connect!', version() as server_version;"
... (positive result)
$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select 'Hello, Spark
Connect!', version() as server_version;"
... (positive result)
```
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #52706 from pan3793/SPARK-54002.
Authored-by: Cheng Pan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
dev/make-distribution.sh | 2 ++
.../apache/spark/launcher/AbstractCommandBuilder.java | 18 ++++++++++++------
.../spark/launcher/SparkClassCommandBuilder.java | 3 +++
3 files changed, 17 insertions(+), 6 deletions(-)
diff --git a/dev/make-distribution.sh b/dev/make-distribution.sh
index b54ea496c633..16598bda8733 100755
--- a/dev/make-distribution.sh
+++ b/dev/make-distribution.sh
@@ -322,9 +322,11 @@ if [ "$MAKE_TGZ" == "true" ]; then
rm -rf "$TARDIR"
cp -r "$DISTDIR" "$TARDIR"
# Set the Spark Connect system variable in these scripts to enable it by
default.
+ awk 'NR==1{print; print "export
SPARK_CONNECT_BEELINE=${SPARK_CONNECT_BEELINE:-1}"; next} {print}'
"$TARDIR/bin/beeline" > tmp && cat tmp > "$TARDIR/bin/beeline"
awk 'NR==1{print; print "export
SPARK_CONNECT_MODE=${SPARK_CONNECT_MODE:-1}"; next} {print}'
"$TARDIR/bin/pyspark" > tmp && cat tmp > "$TARDIR/bin/pyspark"
awk 'NR==1{print; print "export
SPARK_CONNECT_MODE=${SPARK_CONNECT_MODE:-1}"; next} {print}'
"$TARDIR/bin/spark-shell" > tmp && cat tmp > "$TARDIR/bin/spark-shell"
awk 'NR==1{print; print "export
SPARK_CONNECT_MODE=${SPARK_CONNECT_MODE:-1}"; next} {print}'
"$TARDIR/bin/spark-submit" > tmp && cat tmp > "$TARDIR/bin/spark-submit"
+ awk 'NR==1{print; print "if [%SPARK_CONNECT_BEELINE%] == [] set
SPARK_CONNECT_BEELINE=1"; next} {print}' "$TARDIR/bin/beeline.cmd" > tmp && cat
tmp > "$TARDIR/bin/beeline.cmd"
awk 'NR==1{print; print "if [%SPARK_CONNECT_MODE%] == [] set
SPARK_CONNECT_MODE=1"; next} {print}' "$TARDIR/bin/pyspark2.cmd" > tmp && cat
tmp > "$TARDIR/bin/pyspark2.cmd"
awk 'NR==1{print; print "if [%SPARK_CONNECT_MODE%] == [] set
SPARK_CONNECT_MODE=1"; next} {print}' "$TARDIR/bin/spark-shell2.cmd" > tmp &&
cat tmp > "$TARDIR/bin/spark-shell2.cmd"
awk 'NR==1{print; print "if [%SPARK_CONNECT_MODE%] == [] set
SPARK_CONNECT_MODE=1"; next} {print}' "$TARDIR/bin/spark-submit2.cmd" > tmp &&
cat tmp > "$TARDIR/bin/spark-submit2.cmd"
diff --git
a/launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java
b/launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java
index caa2d2b5854e..0214b1102381 100644
---
a/launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java
+++
b/launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java
@@ -66,6 +66,8 @@ abstract class AbstractCommandBuilder {
*/
protected boolean isRemote = System.getenv().containsKey("SPARK_REMOTE");
+ protected boolean isBeeLine = false;
+
AbstractCommandBuilder() {
this.appArgs = new ArrayList<>();
this.childEnv = new HashMap<>();
@@ -195,6 +197,10 @@ abstract class AbstractCommandBuilder {
if (isRemote && "1".equals(getenv("SPARK_SCALA_SHELL")) &&
project.equals("sql/core")) {
continue;
}
+ if (isBeeLine && "1".equals(getenv("SPARK_CONNECT_BEELINE")) &&
+ project.equals("sql/core")) {
+ continue;
+ }
// SPARK-49534: The assumption here is that if `spark-hive_xxx.jar`
is not in the
// classpath, then the `-Phive` profile was not used during package,
and therefore
// the Hive-related jars should also not be in the classpath. To
avoid failure in
@@ -241,13 +247,13 @@ abstract class AbstractCommandBuilder {
}
}
- if (isRemote) {
+ if (isRemote || (isBeeLine &&
"1".equals(getenv("SPARK_CONNECT_BEELINE")))) {
for (File f: new File(jarsDir).listFiles()) {
- // Exclude Spark Classic SQL and Spark Connect server jars
- // if we're in Spark Connect Shell. Also exclude Spark SQL API and
- // Spark Connect Common which Spark Connect client shades.
- // Then, we add the Spark Connect shell and its dependencies in
connect-repl
- // See also SPARK-48936.
+ // Exclude Spark Classic SQL and Spark Connect server jars if we're
in
+ // Spark Connect Shell or BeeLine with Connect JDBC driver. Also
exclude
+ // Spark SQL API and Spark Connect Common which Spark Connect client
shades.
+ // Then, we add the Spark Connect shell and its dependencies in
connect-repl.
+ // See also SPARK-48936, SPARK-54002.
if (f.isDirectory() && f.getName().equals("connect-repl")) {
addToClassPath(cp, join(File.separator, f.toString(), "*"));
} else if (
diff --git
a/launcher/src/main/java/org/apache/spark/launcher/SparkClassCommandBuilder.java
b/launcher/src/main/java/org/apache/spark/launcher/SparkClassCommandBuilder.java
index d7d10d486e3f..2dd0bb13dfd3 100644
---
a/launcher/src/main/java/org/apache/spark/launcher/SparkClassCommandBuilder.java
+++
b/launcher/src/main/java/org/apache/spark/launcher/SparkClassCommandBuilder.java
@@ -38,6 +38,9 @@ class SparkClassCommandBuilder extends AbstractCommandBuilder
{
SparkClassCommandBuilder(String className, List<String> classArgs) {
this.className = className;
this.classArgs = classArgs;
+ if ("org.apache.hive.beeline.BeeLine".equals(className)) {
+ this.isBeeLine = true;
+ }
}
@Override
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]