Rather than specifying the processor types as GRES, I would recommending
defining them as features of the nodes and let the users specify the
features as constraints to their jobs. Since the newer processors are
backwards compatible with the older processors, list the older
processors as features of the newer nodes, too.
For example, say you have some nodes that support AVX512, and some that
only support AVX2. node01 is older and supports only AVX2. Node02 is
newer and supports AVX512, but is backwards compatible and supports
AVX2. I would have something like this in my slurm.conf file:
NodeName=node01 Feature=avx2 ...
NodeName=node02 Feature=avx512,avx2 ...
I have a very hetergeneous cluster with several different generations of
AMD and Intel processors, we use this method quite effectively.
If you want to continue down the road you've already started on, can you
provide more information, like the partition definitions and the gres
definitions? In general, Slurm should support submitting to multiple
partitions.
Prentice
On 3/8/21 11:29 AM, Bas van der Vlies wrote:
Hi,
On this cluster I have version 20.02.6 installed. We have different
partitions for cpu type and gpu types. we want to make it easy for the
user who not care where there job runs and for the experienced user
they can specify the gres type: cpu_type or gpu
I have defined 2 cpu partitions:
* cpu_e5_2650_v1
* cpu_e5_2650_v2
and 2 gres cpu_type:
* e5_2650_v1
* e5_2650_v2
When no partitions are specified it will submit to both partitions:
* srun --exclusive --gres=cpu_type:e5_2650_v1 --pty /bin/bash -->
r16n18 wich has defined this gres and is in partition cpu_e5_2650_v1
Now I submit at the same time another job:
* srun --exclusive --gres=cpu_type:e5_2650_v1 --pty /bin/bash
This fails with: `srun: error: Unable to allocate resources: Requested
node configuration is not available`
I would expect it gets queued in the partition `cpu_e5_2650_v1`.
When I specify the partition on the command line:
* srun --exclusive -p cpu_e5_2650_v1_shared
--gres=cpu_type:e5_2650_v1 --pty /bin/bash
srun: job 1856 queued and waiting for resources
So the question is can slurm handle submitting to multiple partitions
when we specify gres attributes?
Regards