[
https://issues.apache.org/jira/browse/SPARK-29762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972631#comment-16972631
]
Imran Rashid commented on SPARK-29762:
--------------------------------------
I don't really understand the complication. I know there would be some special
casing for GPUs in the config parsing code (eg. in
{{org.apache.spark.resource.ResourceUtils#parseResourceRequirements}}), but
doesn't seem anything too bad.
I did think about this more, and realize it gets a bit confusing when you add
in task-level resource constraints. you won't schedule optimally for tasks
that don't need gpu, and you won't have gpus leftover for the tasks that do
need them. Eg, say you had each executor setup with 4 cores and 2 gpus. If
you had one task set come in which only needed cpu, you would only run 2
copies. And then if another taskset came in which did need the gpus, you
woudn't be able to schedule it.
You can't end up in that situation until you have task-specific resource
constraints. But does it get too messy to have sensible defaults in that
situation? Maybe the user specifies gpus as an executor resource up front, for
the whole cluster, because they have them available and they know some
significant fraction of the workloads need them. They might think that the
regular tasks will just ignore the gpus, and the tasks that do need gpus would
just specify them as task-level constraints.
I guess this might have been a bad suggestion after all, sorry.
> GPU Scheduling - default task resource amount to 1
> --------------------------------------------------
>
> Key: SPARK-29762
> URL: https://issues.apache.org/jira/browse/SPARK-29762
> Project: Spark
> Issue Type: Story
> Components: Spark Core
> Affects Versions: 3.0.0
> Reporter: Thomas Graves
> Priority: Major
>
> Default the task level resource configs (for gpu/fpga, etc) to 1. So if the
> user specifies the executor resource then to make it more user friendly lets
> have the task resource config default to 1. This is ok right now since we
> require resources to have an address. It also matches what we do for the
> spark.task.cpus configs.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]