(spark) branch master updated: [SPARK-54002][DEPLOY] Support integrating BeeLine with Connect JDBC driver

dongjoon Wed, 29 Oct 2025 12:58:01 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new efd066e5394a [SPARK-54002][DEPLOY] Support integrating BeeLine with 
Connect JDBC driver
efd066e5394a is described below

commit efd066e5394a49be2a9dd3ec83123b2aec26763e
Author: Cheng Pan <[email protected]>
AuthorDate: Wed Oct 29 12:57:28 2025 -0700

    [SPARK-54002][DEPLOY] Support integrating BeeLine with Connect JDBC driver
    
    ### What changes were proposed in this pull request?
    
    This PR modifies the classpath for `bin/beeline` - excluding 
`spark-sql-core_*.jar`, `spark-connect_*.jar`, etc., and adding 
`jars/connect-repl/*.jar`, making it as same as `bin/spark-connect-shell`, the 
modified classpath looks like
    
    ```
    jars/*.jar - except for spark-sql-core_*.jar, spark-connect_*.jar, etc.
    jars/connect-repl/*.jar - including spark-connect-client-jdbc_*.jar
    ```
    
    Note: BeeLine itself only requires Hive jars and a few third-party 
utilities jars to run, excluding some `spark-*.jar`s won't break BeeLine's 
existing capability for connecting to Thrift Server.
    
    To ensure no change for classic Spark behavior, for Spark classic(default) 
distribution, the above changes only take effect when setting 
`SPARK_CONNECT_BEELINE=1` explicitly. For convenience, this is enabled by 
default for the Spark connect distribution
    
    ### Why are the changes needed?
    
    It's a new feature, with this feature, users are allowed to use BeeLine as 
an SQL CLI to connect to the Spark Connect server.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No, this feature must be enabled by setting `SPARK_CONNECT_BEELINE=1` 
explicitly for classic(default) Spark distribution.
    
    ### How was this patch tested?
    
    Launch a Connect Server first, in my case, the Connect Server 
(v4.1.0-preview2) runs at `sc://localhost:15002`. To ensure changes won't break 
the Thrift Server use case, also launch a Thrift Server at 
`thrift://localhost:10000`
    
    #### Testing for dev mode
    
    Building
    ```
    $ build/sbt -Phive,hive-thriftserver clean package
    ```
    
    Without setting `SPARK_CONNECT_BEELINE=1`, it fails as expected with `No 
known driver to handle "jdbc:sc://localhost:15002"`
    ```
    $ SPARK_PREPEND_CLASSES=true bin/beeline -u jdbc:sc://localhost:15002
    NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes 
ahead of assembly.
    WARNING: Using incubator modules: jdk.incubator.vector
    scan complete in 0ms
    scan complete in 4ms
    No known driver to handle "jdbc:sc://localhost:15002"
    Beeline version 2.3.10 by Apache Hive
    beeline>
    ```
    
    With setting `SPARK_CONNECT_BEELINE=1`, it works as expected
    ```
    $ SPARK_PREPEND_CLASSES=true SPARK_CONNECT_BEELINE=1 bin/beeline -u 
jdbc:sc://localhost:15002
    NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes 
ahead of assembly.
    WARNING: Using incubator modules: jdk.incubator.vector
    Connecting to jdbc:sc://localhost:15002
    Connected to: Apache Spark Connect Server (version 4.1.0-preview2)
    Driver: Apache Spark Connect JDBC Driver (version 4.1.0-SNAPSHOT)
    Error: Requested transaction isolation level REPEATABLE_READ is not 
supported (state=,code=0)
    Beeline version 2.3.10 by Apache Hive
    0: jdbc:sc://localhost:15002> select 'Hello, Spark Connect!', version() as 
server_version;
    +------------------------+-------------------------------------------------+
    | Hello, Spark Connect!  |                 server_version                  |
    +------------------------+-------------------------------------------------+
    | Hello, Spark Connect!  | 4.1.0 c5ff48cc2b2c5a1526789ae414ff4c63b053d3ec  |
    +------------------------+-------------------------------------------------+
    1 row selected (0.476 seconds)
    0: jdbc:sc://localhost:15002>
    ```
    
    Also, test with Thrift Server to ensure no impact on existing 
functionalities.
    
    It works as expected both with and without `SPARK_CONNECT_BEELINE=1`
    ```
    $ SPARK_PREPEND_CLASSES=true [SPARK_CONNECT_BEELINE=1] bin/beeline -u 
jdbc:hive2://localhost:10000
    NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes 
ahead of assembly.
    WARNING: Using incubator modules: jdk.incubator.vector
    Connecting to jdbc:hive2://localhost:10000
    Connected to: Spark SQL (version 4.1.0-preview2)
    Driver: Hive JDBC (version 2.3.10)
    Transaction isolation: TRANSACTION_REPEATABLE_READ
    Beeline version 2.3.10 by Apache Hive
    0: jdbc:hive2://localhost:10000> select 'Hello, Spark Connect!', version() 
as server_version;
    +------------------------+-------------------------------------------------+
    | Hello, Spark Connect!  |                 server_version                  |
    +------------------------+-------------------------------------------------+
    | Hello, Spark Connect!  | 4.1.0 c5ff48cc2b2c5a1526789ae414ff4c63b053d3ec  |
    +------------------------+-------------------------------------------------+
    1 row selected (0.973 seconds)
    0: jdbc:hive2://localhost:10000>
    ```
    
    #### Testing for Spark distribution
    
    ```
    $ dev/make-distribution.sh --tgz --connect --name SPARK-54002 -Pyarn 
-Pkubernetes -Phadoop-3 -Phive -Phive-thriftserver
    ```
    
    ##### Spark classic distribution
    ```
    $ tar -xzf spark-4.1.0-SNAPSHOT-bin-SPARK-54002.tgz
    $ cd spark-4.1.0-SNAPSHOT-bin-SPARK-54002
    $ bin/beeline -u jdbc:sc://localhost:15002 -e "select 'Hello, Spark 
Connect!', version() as server_version;"
    ... (negative result, fails with 'No known driver to handle 
"jdbc:sc://localhost:15002"')
    $ SPARK_CONNECT_BEELINE=1 bin/beeline -u jdbc:sc://localhost:15002 -e 
"select 'Hello, Spark Connect!', version() as server_version;"
    ... (positive result)
    $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select 'Hello, Spark 
Connect!', version() as server_version;"
    ... (positive result)
    $ SPARK_CONNECT_BEELINE=1 bin/beeline -u jdbc:hive2://localhost:10000 -e 
"select 'Hello, Spark Connect!', version() as server_version;"
    ... (positive result)
    ```
    
    ##### Spark connect distribution
    ```
    $ tar -xzf spark-4.1.0-SNAPSHOT-bin-SPARK-54002-connect.tgz
    $ cd spark-4.1.0-SNAPSHOT-bin-SPARK-54002-connect
    $ bin/beeline -u jdbc:sc://localhost:15002 -e "select 'Hello, Spark 
Connect!', version() as server_version;"
    ... (positive result)
    $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select 'Hello, Spark 
Connect!', version() as server_version;"
    ... (positive result)
    ```
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #52706 from pan3793/SPARK-54002.
    
    Authored-by: Cheng Pan <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 dev/make-distribution.sh                               |  2 ++
 .../apache/spark/launcher/AbstractCommandBuilder.java  | 18 ++++++++++++------
 .../spark/launcher/SparkClassCommandBuilder.java       |  3 +++
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/dev/make-distribution.sh b/dev/make-distribution.sh
index b54ea496c633..16598bda8733 100755
--- a/dev/make-distribution.sh
+++ b/dev/make-distribution.sh
@@ -322,9 +322,11 @@ if [ "$MAKE_TGZ" == "true" ]; then
     rm -rf "$TARDIR"
     cp -r "$DISTDIR" "$TARDIR"
     # Set the Spark Connect system variable in these scripts to enable it by 
default.
+    awk 'NR==1{print; print "export 
SPARK_CONNECT_BEELINE=${SPARK_CONNECT_BEELINE:-1}"; next} {print}' 
"$TARDIR/bin/beeline" > tmp && cat tmp > "$TARDIR/bin/beeline"
     awk 'NR==1{print; print "export 
SPARK_CONNECT_MODE=${SPARK_CONNECT_MODE:-1}"; next} {print}' 
"$TARDIR/bin/pyspark" > tmp && cat tmp > "$TARDIR/bin/pyspark"
     awk 'NR==1{print; print "export 
SPARK_CONNECT_MODE=${SPARK_CONNECT_MODE:-1}"; next} {print}' 
"$TARDIR/bin/spark-shell" > tmp && cat tmp > "$TARDIR/bin/spark-shell"
     awk 'NR==1{print; print "export 
SPARK_CONNECT_MODE=${SPARK_CONNECT_MODE:-1}"; next} {print}' 
"$TARDIR/bin/spark-submit" > tmp && cat tmp > "$TARDIR/bin/spark-submit"
+    awk 'NR==1{print; print "if [%SPARK_CONNECT_BEELINE%] == [] set 
SPARK_CONNECT_BEELINE=1"; next} {print}' "$TARDIR/bin/beeline.cmd" > tmp && cat 
tmp > "$TARDIR/bin/beeline.cmd"
     awk 'NR==1{print; print "if [%SPARK_CONNECT_MODE%] == [] set 
SPARK_CONNECT_MODE=1"; next} {print}' "$TARDIR/bin/pyspark2.cmd" > tmp && cat 
tmp > "$TARDIR/bin/pyspark2.cmd"
     awk 'NR==1{print; print "if [%SPARK_CONNECT_MODE%] == [] set 
SPARK_CONNECT_MODE=1"; next} {print}' "$TARDIR/bin/spark-shell2.cmd" > tmp && 
cat tmp > "$TARDIR/bin/spark-shell2.cmd"
     awk 'NR==1{print; print "if [%SPARK_CONNECT_MODE%] == [] set 
SPARK_CONNECT_MODE=1"; next} {print}' "$TARDIR/bin/spark-submit2.cmd" > tmp && 
cat tmp > "$TARDIR/bin/spark-submit2.cmd"
diff --git 
a/launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java 
b/launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java
index caa2d2b5854e..0214b1102381 100644
--- 
a/launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java
+++ 
b/launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java
@@ -66,6 +66,8 @@ abstract class AbstractCommandBuilder {
    */
   protected boolean isRemote = System.getenv().containsKey("SPARK_REMOTE");
 
+  protected boolean isBeeLine = false;
+
   AbstractCommandBuilder() {
     this.appArgs = new ArrayList<>();
     this.childEnv = new HashMap<>();
@@ -195,6 +197,10 @@ abstract class AbstractCommandBuilder {
           if (isRemote && "1".equals(getenv("SPARK_SCALA_SHELL")) && 
project.equals("sql/core")) {
             continue;
           }
+          if (isBeeLine && "1".equals(getenv("SPARK_CONNECT_BEELINE")) &&
+              project.equals("sql/core")) {
+            continue;
+          }
           // SPARK-49534: The assumption here is that if `spark-hive_xxx.jar` 
is not in the
           // classpath, then the `-Phive` profile was not used during package, 
and therefore
           // the Hive-related jars should also not be in the classpath. To 
avoid failure in
@@ -241,13 +247,13 @@ abstract class AbstractCommandBuilder {
         }
       }
 
-      if (isRemote) {
+      if (isRemote || (isBeeLine && 
"1".equals(getenv("SPARK_CONNECT_BEELINE")))) {
         for (File f: new File(jarsDir).listFiles()) {
-          // Exclude Spark Classic SQL and Spark Connect server jars
-          // if we're in Spark Connect Shell. Also exclude Spark SQL API and
-          // Spark Connect Common which Spark Connect client shades.
-          // Then, we add the Spark Connect shell and its dependencies in 
connect-repl
-          // See also SPARK-48936.
+          // Exclude Spark Classic SQL and Spark Connect server jars if we're 
in
+          // Spark Connect Shell or BeeLine with Connect JDBC driver. Also 
exclude
+          // Spark SQL API and Spark Connect Common which Spark Connect client 
shades.
+          // Then, we add the Spark Connect shell and its dependencies in 
connect-repl.
+          // See also SPARK-48936, SPARK-54002.
           if (f.isDirectory() && f.getName().equals("connect-repl")) {
             addToClassPath(cp, join(File.separator, f.toString(), "*"));
           } else if (
diff --git 
a/launcher/src/main/java/org/apache/spark/launcher/SparkClassCommandBuilder.java
 
b/launcher/src/main/java/org/apache/spark/launcher/SparkClassCommandBuilder.java
index d7d10d486e3f..2dd0bb13dfd3 100644
--- 
a/launcher/src/main/java/org/apache/spark/launcher/SparkClassCommandBuilder.java
+++ 
b/launcher/src/main/java/org/apache/spark/launcher/SparkClassCommandBuilder.java
@@ -38,6 +38,9 @@ class SparkClassCommandBuilder extends AbstractCommandBuilder 
{
   SparkClassCommandBuilder(String className, List<String> classArgs) {
     this.className = className;
     this.classArgs = classArgs;
+    if ("org.apache.hive.beeline.BeeLine".equals(className)) {
+      this.isBeeLine = true;
+    }
   }
 
   @Override


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-54002][DEPLOY] Support integrating BeeLine with Connect JDBC driver

Reply via email to