Re: [slurm-users] Job flexibility with cons_tres

2021-02-12 Thread Ansgar Esztermann-Kirchner
On Fri, Feb 12, 2021 at 09:47:56AM +0100, Ole Holm Nielsen wrote: > > Could you kindly say where you have found documentation of the > DefaultCpusPerGpu (or DefCpusPerGpu?) parameter. Humph, I shouldn't have written the message from memory. It's actually DefCpuPerGPU (singular). > I'm unable t

Re: [slurm-users] Job flexibility with cons_tres

2021-02-12 Thread Ole Holm Nielsen
On 2/12/21 9:24 AM, Ansgar Esztermann-Kirchner wrote: After scouring the docs once more, I've noticed DefaultCpusPerGpu, which seems to be exactly what I was looking for: jobs request a number of GPUs, but no CPUs; and Slurm will assign an appropriate number of CPUs. The only disadvantage is the

Re: [slurm-users] Job flexibility with cons_tres

2021-02-12 Thread Ansgar Esztermann-Kirchner
On Mon, Feb 08, 2021 at 12:36:06PM +0100, Ansgar Esztermann-Kirchner wrote: > Of course, one could use different partitions for different nodes, and > then submit individual jobs with CPU requests tailored to one such > partition, but I'd prefer a more flexible approach where a given job > could r

Re: [slurm-users] Job flexibility with cons_tres

2021-02-10 Thread Aaron Jackson
Similar problem in the cluster I look after. I have a job_submit script which adds certain nodes to the job's excluded nodes list based on each node's number of cpus per gpus. This basically solved problem with fragmentation entirely. The problem is that cons_tres seems to think (for example) tha

Re: [slurm-users] Job flexibility with cons_tres

2021-02-10 Thread Ansgar Esztermann-Kirchner
Hi Yair, thank you very much for your reply. I'll keep the points you make in mind while we're evolving our configuration toward something that can be called production-ready. A. -- Ansgar Esztermann Sysadmin Dep. Theoretical and Computational Biophysics http://www.mpibpc.mpg.de/grubmueller/esz

Re: [slurm-users] Job flexibility with cons_tres

2021-02-09 Thread Yair Yarom
Hi, We have a similar configuration, very heterogeneous cluster and cons_tres. Users need to specify the CPU/memory/GPU/time, and it will schedule their job somewhere. Indeed there's currently no guarantee that you won't be left with a node with unusable GPUs because no CPUs or memory are availabl

[slurm-users] Job flexibility with cons_tres

2021-02-08 Thread Ansgar Esztermann-Kirchner
Hello List, we're running a heterogeneous cluster (just x86_64, but a lot of different node types from 8 to 64 HW threads, 1 to 4 GPUs). Our processing power (for our main application, at least) is exclusively provided by the GPUs, so cons_tres looks quite promising: depending on the size of the