[slurm-users] Job flexibility with cons_tres

Ansgar Esztermann-Kirchner Mon, 08 Feb 2021 03:38:24 -0800

Hello List,

we're running a heterogeneous cluster (just x86_64, but a lot of
different node types from 8 to 64 HW threads, 1 to 4 GPUs).
Our processing power (for our main application, at least) is 
exclusively provided by the GPUs, so cons_tres looks quite promising:
depending on the size of the job, request an appropriate number of
GPUs. Of course, you have to request some CPUs as well -- ideally,
evenly distributed among the GPUs (e.g. 10 per GPU on a 20-core, 2-GPU
node; 16 on a 64-core, 4-GPU node).
Of course, one could use different partitions for different nodes, and
then submit individual jobs with CPU requests tailored to one such
partition, but I'd prefer a more flexible approach where a given job
could run on any large enough node.


Is there anyone with a similar setup? Any config options I've missed,
or do you have a work-around?

Thanks,

A.

-- 
Ansgar Esztermann
Sysadmin Dep. Theoretical and Computational Biophysics
http://www.mpibpc.mpg.de/grubmueller/esztermann

smime.p7s
Description: S/MIME cryptographic signature

[slurm-users] Job flexibility with cons_tres

Reply via email to