Hi Hermann,
Good idea, but we are already using `SelectType=select/cons_tres`. After
setting everything up again (in case I made an unnoticed mistake), I saw
that the node got marked STATE=inval.
To be honest, I thought I can just claim that a node has a gpu even if
it doesn't have one - just for testing purposes. Could this be the issue?
Best regards,
Xaver Stiensmeier
On 17.07.23 14:11, Hermann Schwärzler wrote:
Hi Xaver,
what kind of SelectType are you using in your slurm.conf?
Per https://slurm.schedmd.com/gres.html you have to consider:
"As for the --gpu* option, these options are only supported by Slurm's
select/cons_tres plugin."
So you can use "--gpus ..." only when you state
SelectType = select/cons_tres
in your slurm.conf.
But "--gres=gpu:1" should work always.
Regards
Hermann
On 7/17/23 13:43, Xaver Stiensmeier wrote:
Hey,
I am currently trying to understand how I can schedule a job that
needs a GPU.
I read about GRES https://slurm.schedmd.com/gres.html and tried to use:
GresTypes=gpu
NodeName=test Gres=gpu:1
But calling - after a 'sudo scontrol reconfigure':
srun --gpus 1 hostname
didn't work:
srun: error: Unable to allocate resources: Invalid generic resource
(gres) specification
so I read more https://slurm.schedmd.com/gres.conf.html but that
didn't really help me.
I am rather confused. GRES claims to be generic resources but then it
comes with three defined resources (GPU, MPS, MIG) and using one of
those didn't work in my case.
Obviously, I am misunderstanding something, but I am unsure where to
look.
Best regards,
Xaver Stiensmeier