This is an automated email from the ASF dual-hosted git repository.
yaooqinn pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 706b6a39b187 [SPARK-56948][SQL][TESTS] Make TPCDSQueryBenchmark
heap/broadcast configurable
706b6a39b187 is described below
commit 706b6a39b1876cc888040e65fa192de64616bab0
Author: Kent Yao <[email protected]>
AuthorDate: Wed May 20 12:29:02 2026 +0800
[SPARK-56948][SQL][TESTS] Make TPCDSQueryBenchmark heap/broadcast
configurable
### What changes were proposed in this pull request?
Switch hardcoded `.set(...)` to `.setIfMissing(...)` for three SparkConf
keys in `TPCDSQueryBenchmark`:
- `spark.driver.memory`
- `spark.executor.memory`
- `spark.sql.autoBroadcastJoinThreshold`
Also unify `spark.sql.shuffle.partitions` to use `setIfMissing` for
consistency (functionally equivalent to the existing
`System.getProperty` form).
### Why are the changes needed?
`.set(...)` overrides any `-Dspark.*` JVM property, so users can't
tune heap/broadcast threshold without editing source. At SF10 / SF100
the hardcoded 3g heap OOMs. `spark.sql.shuffle.partitions` already
supported override in the same file — this extends the same pattern
to the remaining three keys.
### Does this PR introduce _any_ user-facing change?
No. Defaults unchanged.
### How was this patch tested?
Verified locally that `-Dspark.driver.memory=72g` (etc.) flow through
to the SparkConf when launched via:
```
build/sbt -Dspark.driver.memory=72g \
-Dspark.executor.memory=72g \
-Dspark.sql.autoBroadcastJoinThreshold=10485760 \
-Dspark.sql.shuffle.partitions=512 \
"sql/Test/runMain ...TPCDSQueryBenchmark --data-location ..."
```
Without these flags, defaults remain `3g / 3g / 20MB / 4`.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.7
Closes #55988 from yaooqinn/SPARK-56948.
Authored-by: Kent Yao <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
---
.../spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala
index c79f9f26d60d..c1ff0eb8458d 100644
---
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala
+++
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala
@@ -51,10 +51,10 @@ object TPCDSQueryBenchmark extends SqlBasedBenchmark with
Logging {
val conf = new SparkConf()
.setMaster(System.getProperty("spark.sql.test.master", "local[1]"))
.setAppName("test-sql-context")
- .set("spark.sql.shuffle.partitions",
System.getProperty("spark.sql.shuffle.partitions", "4"))
- .set("spark.driver.memory", "3g")
- .set("spark.executor.memory", "3g")
- .set("spark.sql.autoBroadcastJoinThreshold", (20 * 1024 * 1024).toString)
+ .setIfMissing("spark.sql.shuffle.partitions", "4")
+ .setIfMissing("spark.driver.memory", "3g")
+ .setIfMissing("spark.executor.memory", "3g")
+ .setIfMissing("spark.sql.autoBroadcastJoinThreshold", (20 * 1024 *
1024).toString)
.set("spark.sql.crossJoin.enabled", "true")
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.set("spark.kryo.registrationRequired", "true")
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]