Re: [slurm-users] Only 2 jobs will start per GPU node despite 4 GPU's being present

2020-08-13 Thread Jodie H. Sprouse
Hello Tina, Thank you for the suggestions and responses!!! As of right now, it seems to be working with taking off the “CPUs=“ all together from gres.conf. The original thought process was to have 4 set aside to always go to the gpu; not so sure that is necessary as long as the CPU partition can

Re: [slurm-users] Only 2 jobs will start per GPU node despite 4 GPU's being present

2020-08-12 Thread Jodie H. Sprouse
Hello Tina, Thank you for the suggestions and responses!!! As of right now, it seems to be working with taking off the “CPUs=“ all together from gres.conf. The original thought process was to have 4 set aside to always go to the gpu; not so sure that is necessary as long as the CPU partition can

Re: [slurm-users] Only 2 jobs will start per GPU node despite 4 GPU's being present

2020-08-10 Thread Tina Friedrich
Hello, yes, that would probably work; or simply taking the "CPUs=" off, really. However, I think what Jodie's trying to do is force all GPU jobs onto one of the CPUs; not allowing all GPU jobs to spread over all processors, regardless of afinity. Jodie - can you try if NodeName=c0005 Name=g

Re: [slurm-users] Only 2 jobs will start per GPU node despite 4 GPU's being present

2020-08-07 Thread Renfro, Michael
I’ve only got 2 GPUs in my nodes, but I’ve always used non-overlapping CPUs= or COREs= settings. Currently, they’re: NodeName=gpunode00[1-4] Name=gpu Type=k80 File=/dev/nvidia[0-1] COREs=0-7,9-15 and I’ve got 2 jobs currently running on each node that’s available. So maybe: NodeName=c0005

Re: [slurm-users] Only 2 jobs will start per GPU node despite 4 GPU's being present

2020-08-07 Thread Jodie H. Sprouse
HI Tina, Thank you so much for looking at this. slurm 18.08.8 nvidia-smi topo -m !sysGPU0GPU1GPU2GPU3mlx5_0 CPU Affinity GPU0 X NV2 NV2 NV2 NODE 0-0,2-2,4-4,6-6,8-8,10-10,12-12,14-14,16-16,18-18,20-20,22-22,24-24,26-26,28-28,30-30,32-32,34-34,36-36,38-

Re: [slurm-users] Only 2 jobs will start per GPU node despite 4 GPU's being present

2020-08-07 Thread Tina Friedrich
Hi Jodie, what version of SLURM are you using? I'm pretty sure newer versions pick the topology up automatically (although I'm on 18.08 so I can't verify that). Is what you're wanting to do - basically - forcefully feed a 'wrong' gres.conf to make SLURM assume all GPUs are on one CPU? (I don

Re: [slurm-users] Only 2 jobs will start per GPU node despite 4 GPU's being present

2020-08-07 Thread Jodie H. Sprouse
Tina, Thank you. Yes, jobs will run on all 4 gpus if I submit with: --gres-flags=disable-binding Yet my goal is to have the gpus bind to a cpu in order to allow a cpu-only job to never run on that particular cpu (having it bound to the gpu and always free for a gpu job) and give the cpu job t

Re: [slurm-users] Only 2 jobs will start per GPU node despite 4 GPU's being present

2020-08-07 Thread Tina Friedrich
Hello, This is something I've seen once on our systems & it took me a while to figure out what was going on. The solution was that the system topology was such that all GPUs were connected to one CPU. There were no free cores on that particular CPU; so SLURM did not schedule any more jobs to

Re: [slurm-users] Only 2 jobs will start per GPU node despite 4 GPU's being present

2020-08-07 Thread Jodie H. Sprouse
Good morning. I have having the same experience here. Wondering if you had a resolution? Thank you. Jodie On Jun 11, 2020, at 3:27 PM, Rhian Resnick mailto:rresn...@fau.edu>> wrote: We have several users submitting single GPU jobs to our cluster. We expected the jobs to fill each node and fu

[slurm-users] Only 2 jobs will start per GPU node despite 4 GPU's being present

2020-06-11 Thread Rhian Resnick
We have several users submitting single GPU jobs to our cluster. We expected the jobs to fill each node and fully utilize the available GPU's but we instead find that only 2 out of the 4 gpu's in each node gets allocated. If we request 2 GPU's in the job and start two jobs, both jobs will start