This is an automated email from the ASF dual-hosted git repository.
tgraves pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new e926d41 [SPARK-30322][DOCS] Add stage level scheduling docs
e926d41 is described below
commit e926d419d305c9400f6f2426ca3e8d04a9180005
Author: Thomas Graves <[email protected]>
AuthorDate: Wed Jul 29 13:46:28 2020 -0500
[SPARK-30322][DOCS] Add stage level scheduling docs
### What changes were proposed in this pull request?
Document the stage level scheduling feature.
### Why are the changes needed?
Document the stage level scheduling feature.
### Does this PR introduce _any_ user-facing change?
Documentation.
### How was this patch tested?
n/a docs only
Closes #29292 from tgravescs/SPARK-30322.
Authored-by: Thomas Graves <[email protected]>
Signed-off-by: Thomas Graves <[email protected]>
---
docs/configuration.md | 7 +++++++
docs/running-on-yarn.md | 4 ++++
2 files changed, 11 insertions(+)
diff --git a/docs/configuration.md b/docs/configuration.md
index abf7610..62799db 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -3028,3 +3028,10 @@ There are configurations available to request resources
for the driver: <code>sp
Spark will use the configurations specified to first request containers with
the corresponding resources from the cluster manager. Once it gets the
container, Spark launches an Executor in that container which will discover
what resources the container has and the addresses associated with each
resource. The Executor will register with the Driver and report back the
resources available to that Executor. The Spark scheduler can then schedule
tasks to each Executor and assign specific reso [...]
See your cluster manager specific page for requirements and details on each of
- [YARN](running-on-yarn.html#resource-allocation-and-configuration-overview),
[Kubernetes](running-on-kubernetes.html#resource-allocation-and-configuration-overview)
and [Standalone
Mode](spark-standalone.html#resource-allocation-and-configuration-overview). It
is currently not available with Mesos or local mode. And please also note that
local-cluster mode with multiple workers is not supported(see Standalon [...]
+
+# Stage Level Scheduling Overview
+
+The stage level scheduling feature allows users to specify task and executor
resource requirements at the stage level. This allows for different stages to
run with executors that have different resources. A prime example of this is
one ETL stage runs with executors with just CPUs, the next stage is an ML stage
that needs GPUs. Stage level scheduling allows for user to request different
executors that have GPUs when the ML stage runs rather then having to acquire
executors with GPUs at th [...]
+This is only available for the RDD API in Scala, Java, and Python and requires
dynamic allocation to be enabled. It is only available on YARN at this time.
See the [YARN](running-on-yarn.html#stage-level-scheduling-overview) page for
more implementation details.
+
+See the `RDD.withResources` and `ResourceProfileBuilder` API's for using this
feature. The current implementation acquires new executors for each
`ResourceProfile` created and currently has to be an exact match. Spark does
not try to fit tasks into an executor that require a different ResourceProfile
than the executor was created with. Executors that are not in use will idle
timeout with the dynamic allocation logic. The default configuration for this
feature is to only allow one Resour [...]
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index 36d8f0b..6f7aaf2b 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -641,6 +641,10 @@ If the user has a user defined YARN resource, lets call it
`acceleratorX` then t
YARN does not tell Spark the addresses of the resources allocated to each
container. For that reason, the user must specify a discovery script that gets
run by the executor on startup to discover what resources are available to that
executor. You can find an example scripts in
`examples/src/main/scripts/getGpusResources.sh`. The script must have execute
permissions set and the user should setup permissions to not allow malicious
users to modify it. The script should write to STDOUT a JSO [...]
+# Stage Level Scheduling Overview
+
+Stage level scheduling is supported on YARN when dynamic allocation is
enabled. One thing to note that is YARN specific is that each ResourceProfile
requires a different container priority on YARN. The mapping is simply the
ResourceProfile id becomes the priority, on YARN lower numbers are higher
priority. This means that profiles created earlier will have a higher priority
in YARN. Normally this won't matter as Spark finishes one stage before starting
another one, the only case this mig [...]
+
# Important notes
- Whether core requests are honored in scheduling decisions depends on which
scheduler is in use and how it is configured.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]