This is an automated email from the ASF dual-hosted git repository.
mridulm80 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new dd4f112a3f3 [SPARK-42149][YARN] Remove the env
`SPARK_USE_CONC_INCR_GC` used to enable CMS GC for Yarn AM
dd4f112a3f3 is described below
commit dd4f112a3f375ccc762a9bb7fa674f32a2adb6e1
Author: yangjie01 <[email protected]>
AuthorDate: Sat Jan 21 22:03:04 2023 -0600
[SPARK-42149][YARN] Remove the env `SPARK_USE_CONC_INCR_GC` used to enable
CMS GC for Yarn AM
### What changes were proposed in this pull request?
The env `SPARK_USE_CONC_INCR_GC` use to enable CMS GC for Yarn AM, but user
can achieve the same goal by directly configuring
`spark.driver.extraJavaOptions(cluster mode)` or
`spark.yarn.am.extraJavaOptions(client mode)` and this env is not exposed to
users in any user documents.
At the same time, Java 14 starts to remove support for
CMS([JEP-363](https://openjdk.org/jeps/363)). So when we use Java 17 to run
Spark and set `SPARK_USE_CONC_INCR_GC=true`, CMS will not take effect (No error
be reported due to `-XX:+IgnoreUnrecognizedVMOptions` is configured by default).
This pr aims to remove this env and a similar commented out code in
`yarn.ExecutorRunnable` for code cleanup.
### Why are the changes needed?
Clean up an env `SPARK_USE_CONC_INCR_GC` that is not exposed to users in
the any document.
### Does this PR introduce _any_ user-facing change?
Yes, env `SPARK_USE_CONC_INCR_GC` is no longer effective, but users can
enable CMS GC(when Java version less than 14) by directly configuring
`spark.driver.extraJavaOptions(cluster mode)` or
`spark.yarn.am.extraJavaOptions(client mode)`.
### How was this patch tested?
Pass GitHub Actions
Closes #39674 from LuciferYang/remove-SPARK_USE_CONC_INCR_GC.
Authored-by: yangjie01 <[email protected]>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
---
.../org/apache/spark/deploy/yarn/Client.scala | 20 -----------------
.../spark/deploy/yarn/ExecutorRunnable.scala | 26 ----------------------
2 files changed, 46 deletions(-)
diff --git
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
index 2daf9cd1df4..d8e9cd8b47d 100644
---
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
+++
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
@@ -1005,26 +1005,6 @@ private[spark] class Client(
val tmpDir = new Path(Environment.PWD.$$(),
YarnConfiguration.DEFAULT_CONTAINER_TEMP_DIR)
javaOpts += "-Djava.io.tmpdir=" + tmpDir
- // TODO: Remove once cpuset version is pushed out.
- // The context is, default gc for server class machines ends up using all
cores to do gc -
- // hence if there are multiple containers in same node, Spark GC affects
all other containers'
- // performance (which can be that of other Spark containers)
- // Instead of using this, rely on cpusets by YARN to enforce "proper"
Spark behavior in
- // multi-tenant environments. Not sure how default Java GC behaves if it
is limited to subset
- // of cores on a node.
- val useConcurrentAndIncrementalGC =
launchEnv.get("SPARK_USE_CONC_INCR_GC").exists(_.toBoolean)
- if (useConcurrentAndIncrementalGC) {
- // In our expts, using (default) throughput collector has severe perf
ramifications in
- // multi-tenant machines
- javaOpts += "-XX:+UseConcMarkSweepGC"
- javaOpts += "-XX:MaxTenuringThreshold=31"
- javaOpts += "-XX:SurvivorRatio=8"
- javaOpts += "-XX:+CMSIncrementalMode"
- javaOpts += "-XX:+CMSIncrementalPacing"
- javaOpts += "-XX:CMSIncrementalDutyCycleMin=0"
- javaOpts += "-XX:CMSIncrementalDutyCycle=10"
- }
-
// Include driver-specific java options if we are launching a driver
if (isClusterMode) {
sparkConf.get(DRIVER_JAVA_OPTIONS).foreach { opts =>
diff --git
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
index dbf4a0a8052..0148b6f3c95 100644
---
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
+++
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
@@ -160,32 +160,6 @@ private[yarn] class ExecutorRunnable(
.filter { case (k, v) => SparkConf.isExecutorStartupConf(k) }
.foreach { case (k, v) => javaOpts +=
YarnSparkHadoopUtil.escapeForShell(s"-D$k=$v") }
- // Commenting it out for now - so that people can refer to the properties
if required. Remove
- // it once cpuset version is pushed out.
- // The context is, default gc for server class machines end up using all
cores to do gc - hence
- // if there are multiple containers in same node, spark gc effects all
other containers
- // performance (which can also be other spark containers)
- // Instead of using this, rely on cpusets by YARN to enforce spark behaves
'properly' in
- // multi-tenant environments. Not sure how default java gc behaves if it
is limited to subset
- // of cores on a node.
- /*
- else {
- // If no java_opts specified, default to using
-XX:+CMSIncrementalMode
- // It might be possible that other modes/config is being done in
- // spark.executor.extraJavaOptions, so we don't want to mess with it.
- // In our expts, using (default) throughput collector has severe
perf ramifications in
- // multi-tenant machines
- // The options are based on
- //
http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html#0.0.0.%20When%20to%20Use
- // %20the%20Concurrent%20Low%20Pause%20Collector|outline
- javaOpts += "-XX:+UseConcMarkSweepGC"
- javaOpts += "-XX:+CMSIncrementalMode"
- javaOpts += "-XX:+CMSIncrementalPacing"
- javaOpts += "-XX:CMSIncrementalDutyCycleMin=0"
- javaOpts += "-XX:CMSIncrementalDutyCycle=10"
- }
- */
-
// For log4j configuration to reference
javaOpts += ("-Dspark.yarn.app.container.log.dir=" +
ApplicationConstants.LOG_DIR_EXPANSION_VAR)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]