Hello Tina,
Thank you for the suggestions and responses!!!
As of right now, it seems to be working with taking off the “CPUs=“ all
together from gres.conf. The original thought process was to have 4 set aside
to always go to the gpu; not so sure that is necessary as long as the CPU
partition can
Hello Tina,
Thank you for the suggestions and responses!!!
As of right now, it seems to be working with taking off the “CPUs=“ all
together from gres.conf. The original thought process was to have 4 set aside
to always go to the gpu; not so sure that is necessary as long as the CPU
partition can
Hello,
yes, that would probably work; or simply taking the "CPUs=" off, really.
However, I think what Jodie's trying to do is force all GPU jobs onto
one of the CPUs; not allowing all GPU jobs to spread over all
processors, regardless of afinity.
Jodie - can you try if
NodeName=c0005 Name=g
I’ve only got 2 GPUs in my nodes, but I’ve always used non-overlapping CPUs= or
COREs= settings. Currently, they’re:
NodeName=gpunode00[1-4] Name=gpu Type=k80 File=/dev/nvidia[0-1] COREs=0-7,9-15
and I’ve got 2 jobs currently running on each node that’s available.
So maybe:
NodeName=c0005
HI Tina,
Thank you so much for looking at this.
slurm 18.08.8
nvidia-smi topo -m
!sysGPU0GPU1GPU2GPU3mlx5_0 CPU Affinity
GPU0 X NV2 NV2 NV2 NODE
0-0,2-2,4-4,6-6,8-8,10-10,12-12,14-14,16-16,18-18,20-20,22-22,24-24,26-26,28-28,30-30,32-32,34-34,36-36,38-
Hi Jodie,
what version of SLURM are you using? I'm pretty sure newer versions pick
the topology up automatically (although I'm on 18.08 so I can't verify
that).
Is what you're wanting to do - basically - forcefully feed a 'wrong'
gres.conf to make SLURM assume all GPUs are on one CPU? (I don
Tina,
Thank you. Yes, jobs will run on all 4 gpus if I submit with:
--gres-flags=disable-binding
Yet my goal is to have the gpus bind to a cpu in order to allow a cpu-only job
to never run on that particular cpu (having it bound to the gpu and always
free for a gpu job) and give the cpu job t
Hello,
This is something I've seen once on our systems & it took me a while to
figure out what was going on.
The solution was that the system topology was such that all GPUs were
connected to one CPU. There were no free cores on that particular CPU;
so SLURM did not schedule any more jobs to
Good morning.
I have having the same experience here. Wondering if you had a resolution?
Thank you.
Jodie
On Jun 11, 2020, at 3:27 PM, Rhian Resnick
mailto:rresn...@fau.edu>> wrote:
We have several users submitting single GPU jobs to our cluster. We expected
the jobs to fill each node and fu
We have several users submitting single GPU jobs to our cluster. We expected
the jobs to fill each node and fully utilize the available GPU's but we instead
find that only 2 out of the 4 gpu's in each node gets allocated.
If we request 2 GPU's in the job and start two jobs, both jobs will start
10 matches
Mail list logo