So the node definition is separate from the partition definition.
You would need to define all the GPUs as part of the node. Partitions do
not have physical characteristics, but they do have QOS capabilities
that you may be able to use. You could also use a job_submit lua script
to reject jobs that request resources you do not want used in a
particular queue.
Both would take some research to find the best approach, but I think
those are the two options available that may do what you are looking for.
Brian Andrus
On 3/31/2021 8:21 AM, Cristóbal Navarro wrote:
Hi Community,
I was checking the documentation but could find clear information on
what I am trying to do.
Here at the university we have a large compute node with 3 classes of
GPUs. Lets say the node's hostname is "gpuComputer", it is composed of:
* 4x large GPUs
* 4x medium GPUs (MIG devices)
* 16x small GPUs (Mig devices)
Our plan is that we want to have one partition for each class of GPUs.
So if a user chooses the "small" partition, it will only see up to 16x
small GPUs, and would not interfere with other jobs running on the
"medium" or "large" partitions.
Can I create three partitions and specify the corresponding subset of
GPUs for each one?
If not, would NodeName and NodeHostname serve as an alternative way?
i.e., to specify the node three times with different NodeName, but all
using the same Hostname=gpuComputer, and specifying the corresponding
subset of "Gres" resources for each one. Then on each partition, to
choose the corresponding NodeName.
Any feedback or advice on the best way to accomplish this would be
much appreciated.
best regards
--
Cristóbal A. Navarro