Re: [slurm-users] disable-bindings disables counting of gres resources

2019-04-15 Thread Peter Steinbach
Hi Chris, thanks for the detailed feedback. This is slurm 18.08.5, see also https://github.com/psteinb/docker-centos7-slurm/blob/7bdb89161febacfd2dbbcb3c5684336fb73d7608/Dockerfile#L9 Best, Peter smime.p7s Description: S/MIME Cryptographic Signature

Re: [slurm-users] disable-bindings disables counting of gres resources

2019-04-15 Thread Peter Steinbach
Hi Chris, thanks for following up on this thread. First of all, you will want to use cgroups to ensure that processes that do not request GPUs cannot access them. We had a feeling that cgroups might be more optimal. Could you point us to documentation that suggests cgroups to be a requireme

Re: [slurm-users] disable-bindings disables counting of gres resources

2019-03-29 Thread Peter Steinbach
Just to follow up, I filed a medium bug report with schedmd on this: https://bugs.schedmd.com/show_bug.cgi?id=6763 Best, Peter On 3/25/19 10:30 AM, Peter Steinbach wrote: Dear all, Using these config files, https://github.com/psteinb/docker-centos7-slurm/blob

[slurm-users] disable-bindings disables counting of gres resources

2019-03-25 Thread Peter Steinbach
Dear all, Using these config files, https://github.com/psteinb/docker-centos7-slurm/blob/7bdb89161febacfd2dbbcb3c5684336fb73d7608/gres.conf https://github.com/psteinb/docker-centos7-slurm/blob/7bdb89161febacfd2dbbcb3c5684336fb73d7608/slurm.conf I observed a weird behavior of the '--gres-flags=

Re: [slurm-users] Sharing a node with non-gres and gres jobs

2019-03-21 Thread Peter Steinbach
After more tests, the situation clears a bit. If "COREs=0,1" (etc) is present in the `gres.conf` file, then one can inject gres jobs on a single core only by using `--gres-flags=disable-bindung` if a non-gres job is running the same node. If "COREs=0,1" is NOT present in `gres.conf`. then any

Re: [slurm-users] Sharing a node with non-gres and gres jobs

2019-03-20 Thread Peter Steinbach
Interesting enough, if I add Cores=0-1 and Cores=2-3 to the gres.conf file, everything stops working again. :/ Should I send around scontrol outputs? And yes, I watched out to set the --mem flag for the job submission this time. Best, Peter smime.p7s Description: S/MIME Cryptographic Signat

Re: [slurm-users] Sharing a node with non-gres and gres jobs

2019-03-20 Thread Peter Steinbach
Hi Philippe, thanks for spotting this. This indeed appears to solve this first issue. Now I can try to make the GPUs available and play with pinning etc. Superb - if you happen to be at ISC, let me know. I'd buy you a coffee/beer! ;) Peter smime.p7s Description: S/MIME Cryptographic Sign

Re: [slurm-users] Sharing a node with non-gres and gres jobs

2019-03-20 Thread Peter Steinbach
Hi Chris, I changed the initial state a bit (the number of cores per node was misconfigured): https://raw.githubusercontent.com/psteinb/docker-centos7-slurm/18.08.5-with-gres/slurm.conf But that doesn't change things. Initially, I see this: # sinfo -N -l Wed Mar 20 09:03:26 2019 NODELIST NO

Re: [slurm-users] Sharing a node with non-gres and gres jobs

2019-03-19 Thread Peter Steinbach
Hi Benson, As you can perhaps see from our slurm.conf, we have task affinity or similar switches off. Along the same route, i also removed the core binding of the GPUs. That is why, I am quite surprised, that slurm doesn’t allow new jobs in. I am aware of the PCIe bandwidth implications of a GP

Re: [slurm-users] Sharing a node with non-gres and gres jobs

2019-03-19 Thread Peter Steinbach
I've read through the parameters. I am not sure if any of those would help in our situation. What suggestions would you make? Note, it's not the scheduler policy that appears to hinder us. It's about how slurm keeps track of the generic resource and (potentially) binds it to available cores. Th

Re: [slurm-users] Sharing a node with non-gres and gres jobs

2019-03-19 Thread Peter Steinbach
Dear Eli, thanks for your reply. The slurm.conf file I suggested lists this parameter. We use SelectType=select/cons_res SelectTypeParameters=CR_Core_Memory See also: https://github.com/psteinb/docker-centos7-slurm/blob/18.08.5-with-gres/slurm.conf#L60 I'll check if that makes a difference.

[slurm-users] Sharing a node with non-gres and gres jobs

2019-03-19 Thread Peter Steinbach
Hi, we are struggling with a slurm 18.08.5 installation of ours. We are in a situation, where our GPU nodes have a considerable number of cores but "only" 2 GPUs inside. While people run jobs using the GPUs, non-GPU jobs can enter alright. However, we found out the hard way, that the inverse