Hi Hermann,

Good idea, but we are already using `SelectType=select/cons_tres`. After
setting everything up again (in case I made an unnoticed mistake), I saw
that the node got marked STATE=inval.

To be honest, I thought I can just claim that a node has a gpu even if
it doesn't have one - just for testing purposes. Could this be the issue?

Best regards,
Xaver Stiensmeier

On 17.07.23 14:11, Hermann Schwärzler wrote:
Hi Xaver,

what kind of SelectType are you using in your slurm.conf?

Per https://slurm.schedmd.com/gres.html you have to consider:
"As for the --gpu* option, these options are only supported by Slurm's
select/cons_tres plugin."

So you can use "--gpus ..." only when you state
SelectType              = select/cons_tres
in your slurm.conf.

But "--gres=gpu:1" should work always.

Regards
Hermann


On 7/17/23 13:43, Xaver Stiensmeier wrote:
Hey,

I am currently trying to understand how I can schedule a job that
needs a GPU.

I read about GRES https://slurm.schedmd.com/gres.html and tried to use:

GresTypes=gpu
NodeName=test Gres=gpu:1

But calling - after a 'sudo scontrol reconfigure':

srun --gpus 1 hostname

didn't work:

srun: error: Unable to allocate resources: Invalid generic resource
(gres) specification

so I read more https://slurm.schedmd.com/gres.conf.html but that
didn't really help me.


I am rather confused. GRES claims to be generic resources but then it
comes with three defined resources (GPU, MPS, MIG) and using one of
those didn't work in my case.

Obviously, I am misunderstanding something, but I am unsure where to
look.


Best regards,
Xaver Stiensmeier



Reply via email to