(spark) branch branch-4.x updated: [SPARK-57262][SQL][WEBUI] Job description derived from a query should respect `spark.sql.redaction.string.regex`

sarutak Sun, 07 Jun 2026 22:29:37 -0700

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-4.x
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-4.x by this push:
     new 68af2952b4c3 [SPARK-57262][SQL][WEBUI] Job description derived from a 
query should respect `spark.sql.redaction.string.regex`
68af2952b4c3 is described below

commit 68af2952b4c37eac8e1f07825b6f04cc9b27b188
Author: Kousuke Saruta <[email protected]>
AuthorDate: Mon Jun 8 14:28:55 2026 +0900

    [SPARK-57262][SQL][WEBUI] Job description derived from a query should 
respect `spark.sql.redaction.string.regex`
    
    ### What changes were proposed in this pull request?
    This PR changes `SparkSQLDriver.scala` to redact a query before 
`setJobDescription`.
    
    ### Why are the changes needed?
    In the current implementation, when a query is executed through 
`SparkSQLDriver`, redaction is done in `SQLExecution.scala` so the description 
in the table on the top of `/SQL/execution` is redacted.
    <img width="1083" height="349" alt="sql-execution-page-top-table" 
src="https://github.com/user-attachments/assets/b06fb255-2b46-473d-9046-1b2d578e3bda";
 />
    
    But the description in the table on the `/jobs` page and the one in the 
table on the bottom of `/SQL/execution` page are not redacted.
    <img width="525" height="692" alt="jobs-page-before" 
src="https://github.com/user-attachments/assets/31c88b98-779b-4305-bf71-58f19a1d7117";
 />
    <img width="515" height="274" alt="sql-execution-page-before" 
src="https://github.com/user-attachments/assets/012be251-f642-4ded-8f77-32f811b05cac";
 />
    
    NOTE:
    Even after this PR is merged, when a job description is set manually using 
`sc.setJobDescription`, the description displayed in the `/jobs` page and the 
one on the bottom of `/SQL/execution` page are not redacted though the one on 
the top of `SQL/execution` page is redacted.
    
    ```
    $ bin/spark-shell -c spark.sql.redaction.string.regex="secret.*=.*"
    scala> val s = "SELECT * FROM (SELECT 'secret=1')"
    scala> sc.setJobDescription(s)
    scala> sql(s).show()
    +--------+
    |secret=1|
    +--------+
    |secret=1|
    +--------+
    ```
    
    **description in `/jobs` page**
    <img width="555" height="226" alt="jobs-page-not-redacted" 
src="https://github.com/user-attachments/assets/b4e084ad-b648-4ba6-b049-ef42f570398d";
 />
    **description in `/SQL/execution` (top)**
    <img width="913" height="203" alt="sql-execution-page-redacted" 
src="https://github.com/user-attachments/assets/91e745f0-aa7f-4618-98e9-5b4b117415da";
 />
    **description in `/SQL/execution` (bottom)**
    <img width="536" height="292" alt="sql-execution-page-not-redacted" 
src="https://github.com/user-attachments/assets/761aad76-0d1b-49af-9e03-58510cd474d1";
 />
    
    This is consistent with the previous behavior and not a regression. There 
is no simple way to redact them and doing it is out of scope of this PR.
    
    ### Does this PR introduce _any_ user-facing change?
    Yes.
    
    ### How was this patch tested?
    Added new test and confirmed the test `SQL execution description should 
respect spark.sql.redaction.string.regex` added in #56358 passed.
    Also confirmed descriptions are redacted in UI.
    ```
    $ bin/spark-sql --conf spark.sql.redaction.string.regex="secret.*=.*"
    spark-sql (default)>  CREATE TABLE test1(secret string);
    spark-sql (default)> SELECT * FROM test1 WHERE secret=1;
    ```
    <img width="607" height="213" alt="jobs-page-after-2" 
src="https://github.com/user-attachments/assets/62646cfc-67c3-46b5-a9f9-695b1f874462";
 />
    <img width="589" height="274" alt="sql-execution-page-after-2" 
src="https://github.com/user-attachments/assets/597db0da-58fb-4275-b6aa-7e8b301f15d0";
 />
    
    ### Was this patch authored or co-authored using generative AI tooling?
    Kiro CLI / Claude
    
    Closes #56361 from sarutak/fix-redact-sql-description-v2.
    
    Authored-by: Kousuke Saruta <[email protected]>
    Signed-off-by: Kousuke Saruta <[email protected]>
    (cherry picked from commit 96b255f16c6240cc2cdf7a8f747e9f00983f537c)
    Signed-off-by: Kousuke Saruta <[email protected]>
---
 .../sql/hive/thriftserver/SparkSQLDriver.scala     |  4 +-
 .../hive/thriftserver/SparkSQLDriverSuite.scala    | 51 ++++++++++++++++++++++
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala
index 2040f8f565a2..f6f88cf8a012 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala
@@ -66,7 +66,9 @@ private[hive] class SparkSQLDriver(val sparkSession: 
SparkSession = SparkSQLEnv.
       val substitutorCommand = 
SQLConf.withExistingConf(sparkSession.sessionState.conf) {
         new VariableSubstitution().substitute(command)
       }
-      sparkSession.sparkContext.setJobDescription(substitutorCommand)
+      val redactedCommand =
+        Utils.redact(sparkSession.sessionState.conf.stringRedactionPattern, 
substitutorCommand)
+      sparkSession.sparkContext.setJobDescription(redactedCommand)
 
       // Parse with an empty parameter context to enable pre-parsing phase 
that scans for
       // parameter markers. If any parameter markers (:name or ?) are found in 
the SQL,
diff --git 
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriverSuite.scala
 
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriverSuite.scala
new file mode 100644
index 000000000000..f42970ede669
--- /dev/null
+++ 
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriverSuite.scala
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.thriftserver
+
+import org.apache.spark.SparkContext
+import org.apache.spark.scheduler.{SparkListener, SparkListenerJobStart}
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.util.Utils.REDACTION_REPLACEMENT_TEXT
+
+class SparkSQLDriverSuite extends SharedSparkSession {
+
+  test("job description should be redacted by 
spark.sql.redaction.string.regex") {
+    withSQLConf(SQLConf.SQL_STRING_REDACTION_PATTERN.key -> 
"password=([^\\s]+)") {
+      var jobDescription: String = null
+      spark.sparkContext.addSparkListener(new SparkListener {
+        override def onJobStart(jobStart: SparkListenerJobStart): Unit = {
+          jobDescription =
+            jobStart.properties.getProperty(SparkContext.SPARK_JOB_DESCRIPTION)
+        }
+      })
+
+      val driver = new SparkSQLDriver(spark)
+      try {
+        driver.run("SELECT 'password=secret123'")
+      } finally {
+        driver.close()
+      }
+
+      spark.sparkContext.listenerBus.waitUntilEmpty()
+      assert(jobDescription != null)
+      assert(!jobDescription.contains("secret123"))
+      assert(jobDescription.contains(REDACTION_REPLACEMENT_TEXT))
+    }
+  }
+}


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch branch-4.x updated: [SPARK-57262][SQL][WEBUI] Job description derived from a query should respect `spark.sql.redaction.string.regex`

Reply via email to