This is an automated email from the ASF dual-hosted git repository.
sarutak pushed a commit to branch branch-4.x
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.x by this push:
new 68af2952b4c3 [SPARK-57262][SQL][WEBUI] Job description derived from a
query should respect `spark.sql.redaction.string.regex`
68af2952b4c3 is described below
commit 68af2952b4c37eac8e1f07825b6f04cc9b27b188
Author: Kousuke Saruta <[email protected]>
AuthorDate: Mon Jun 8 14:28:55 2026 +0900
[SPARK-57262][SQL][WEBUI] Job description derived from a query should
respect `spark.sql.redaction.string.regex`
### What changes were proposed in this pull request?
This PR changes `SparkSQLDriver.scala` to redact a query before
`setJobDescription`.
### Why are the changes needed?
In the current implementation, when a query is executed through
`SparkSQLDriver`, redaction is done in `SQLExecution.scala` so the description
in the table on the top of `/SQL/execution` is redacted.
<img width="1083" height="349" alt="sql-execution-page-top-table"
src="https://github.com/user-attachments/assets/b06fb255-2b46-473d-9046-1b2d578e3bda"
/>
But the description in the table on the `/jobs` page and the one in the
table on the bottom of `/SQL/execution` page are not redacted.
<img width="525" height="692" alt="jobs-page-before"
src="https://github.com/user-attachments/assets/31c88b98-779b-4305-bf71-58f19a1d7117"
/>
<img width="515" height="274" alt="sql-execution-page-before"
src="https://github.com/user-attachments/assets/012be251-f642-4ded-8f77-32f811b05cac"
/>
NOTE:
Even after this PR is merged, when a job description is set manually using
`sc.setJobDescription`, the description displayed in the `/jobs` page and the
one on the bottom of `/SQL/execution` page are not redacted though the one on
the top of `SQL/execution` page is redacted.
```
$ bin/spark-shell -c spark.sql.redaction.string.regex="secret.*=.*"
scala> val s = "SELECT * FROM (SELECT 'secret=1')"
scala> sc.setJobDescription(s)
scala> sql(s).show()
+--------+
|secret=1|
+--------+
|secret=1|
+--------+
```
**description in `/jobs` page**
<img width="555" height="226" alt="jobs-page-not-redacted"
src="https://github.com/user-attachments/assets/b4e084ad-b648-4ba6-b049-ef42f570398d"
/>
**description in `/SQL/execution` (top)**
<img width="913" height="203" alt="sql-execution-page-redacted"
src="https://github.com/user-attachments/assets/91e745f0-aa7f-4618-98e9-5b4b117415da"
/>
**description in `/SQL/execution` (bottom)**
<img width="536" height="292" alt="sql-execution-page-not-redacted"
src="https://github.com/user-attachments/assets/761aad76-0d1b-49af-9e03-58510cd474d1"
/>
This is consistent with the previous behavior and not a regression. There
is no simple way to redact them and doing it is out of scope of this PR.
### Does this PR introduce _any_ user-facing change?
Yes.
### How was this patch tested?
Added new test and confirmed the test `SQL execution description should
respect spark.sql.redaction.string.regex` added in #56358 passed.
Also confirmed descriptions are redacted in UI.
```
$ bin/spark-sql --conf spark.sql.redaction.string.regex="secret.*=.*"
spark-sql (default)> CREATE TABLE test1(secret string);
spark-sql (default)> SELECT * FROM test1 WHERE secret=1;
```
<img width="607" height="213" alt="jobs-page-after-2"
src="https://github.com/user-attachments/assets/62646cfc-67c3-46b5-a9f9-695b1f874462"
/>
<img width="589" height="274" alt="sql-execution-page-after-2"
src="https://github.com/user-attachments/assets/597db0da-58fb-4275-b6aa-7e8b301f15d0"
/>
### Was this patch authored or co-authored using generative AI tooling?
Kiro CLI / Claude
Closes #56361 from sarutak/fix-redact-sql-description-v2.
Authored-by: Kousuke Saruta <[email protected]>
Signed-off-by: Kousuke Saruta <[email protected]>
(cherry picked from commit 96b255f16c6240cc2cdf7a8f747e9f00983f537c)
Signed-off-by: Kousuke Saruta <[email protected]>
---
.../sql/hive/thriftserver/SparkSQLDriver.scala | 4 +-
.../hive/thriftserver/SparkSQLDriverSuite.scala | 51 ++++++++++++++++++++++
2 files changed, 54 insertions(+), 1 deletion(-)
diff --git
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala
index 2040f8f565a2..f6f88cf8a012 100644
---
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala
+++
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala
@@ -66,7 +66,9 @@ private[hive] class SparkSQLDriver(val sparkSession:
SparkSession = SparkSQLEnv.
val substitutorCommand =
SQLConf.withExistingConf(sparkSession.sessionState.conf) {
new VariableSubstitution().substitute(command)
}
- sparkSession.sparkContext.setJobDescription(substitutorCommand)
+ val redactedCommand =
+ Utils.redact(sparkSession.sessionState.conf.stringRedactionPattern,
substitutorCommand)
+ sparkSession.sparkContext.setJobDescription(redactedCommand)
// Parse with an empty parameter context to enable pre-parsing phase
that scans for
// parameter markers. If any parameter markers (:name or ?) are found in
the SQL,
diff --git
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriverSuite.scala
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriverSuite.scala
new file mode 100644
index 000000000000..f42970ede669
--- /dev/null
+++
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriverSuite.scala
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.thriftserver
+
+import org.apache.spark.SparkContext
+import org.apache.spark.scheduler.{SparkListener, SparkListenerJobStart}
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.util.Utils.REDACTION_REPLACEMENT_TEXT
+
+class SparkSQLDriverSuite extends SharedSparkSession {
+
+ test("job description should be redacted by
spark.sql.redaction.string.regex") {
+ withSQLConf(SQLConf.SQL_STRING_REDACTION_PATTERN.key ->
"password=([^\\s]+)") {
+ var jobDescription: String = null
+ spark.sparkContext.addSparkListener(new SparkListener {
+ override def onJobStart(jobStart: SparkListenerJobStart): Unit = {
+ jobDescription =
+ jobStart.properties.getProperty(SparkContext.SPARK_JOB_DESCRIPTION)
+ }
+ })
+
+ val driver = new SparkSQLDriver(spark)
+ try {
+ driver.run("SELECT 'password=secret123'")
+ } finally {
+ driver.close()
+ }
+
+ spark.sparkContext.listenerBus.waitUntilEmpty()
+ assert(jobDescription != null)
+ assert(!jobDescription.contains("secret123"))
+ assert(jobDescription.contains(REDACTION_REPLACEMENT_TEXT))
+ }
+ }
+}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]