That would certainly do it.  If you look at the slurmctld log when it comes up, 
it will say that it's marking that node as invalid because it has less (0) gres 
resources then you say it should have.  That's because slurmd on that node will 
come up and say "What gres resources??"

For testing purposes,  you can just create a dummy file on the node, then in 
gres.conf, point to that file as the "graphics file" interface.  As long as you 
don't try to actually use it as a graphics file, that should be enough for that 
node to think it has gres/gpu resources.  That's what I do in my vagrant slurm 
cluster.

Rob

________________________________
From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Xaver 
Stiensmeier <xaverstiensme...@gmx.de>
Sent: Monday, July 17, 2023 9:43 AM
To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
Subject: Re: [slurm-users] GRES and GPUs

Hi Hermann,

Good idea, but we are already using `SelectType=select/cons_tres`. After
setting everything up again (in case I made an unnoticed mistake), I saw
that the node got marked STATE=inval.

To be honest, I thought I can just claim that a node has a gpu even if
it doesn't have one - just for testing purposes. Could this be the issue?

Best regards,
Xaver Stiensmeier

On 17.07.23 14:11, Hermann Schwärzler wrote:
> Hi Xaver,
>
> what kind of SelectType are you using in your slurm.conf?
>
> Per 
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fgres.html&data=05%7C01%7Crug262%40psu.edu%7Cbc4b7775beae4d2e376c08db86cbfc7b%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638251982928987379%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PqvE6pL2sKSb6KxLngi0sbm6qhIv8MRYTmUM%2Bgq1hrI%3D&reserved=0<https://slurm.schedmd.com/gres.html>
>  you have to consider:
> "As for the --gpu* option, these options are only supported by Slurm's
> select/cons_tres plugin."
>
> So you can use "--gpus ..." only when you state
> SelectType              = select/cons_tres
> in your slurm.conf.
>
> But "--gres=gpu:1" should work always.
>
> Regards
> Hermann
>
>
> On 7/17/23 13:43, Xaver Stiensmeier wrote:
>> Hey,
>>
>> I am currently trying to understand how I can schedule a job that
>> needs a GPU.
>>
>> I read about GRES 
>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fgres.html&data=05%7C01%7Crug262%40psu.edu%7Cbc4b7775beae4d2e376c08db86cbfc7b%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638251982928987379%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PqvE6pL2sKSb6KxLngi0sbm6qhIv8MRYTmUM%2Bgq1hrI%3D&reserved=0<https://slurm.schedmd.com/gres.html>
>>  and tried to use:
>>
>> GresTypes=gpu
>> NodeName=test Gres=gpu:1
>>
>> But calling - after a 'sudo scontrol reconfigure':
>>
>> srun --gpus 1 hostname
>>
>> didn't work:
>>
>> srun: error: Unable to allocate resources: Invalid generic resource
>> (gres) specification
>>
>> so I read more 
>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fgres.conf.html&data=05%7C01%7Crug262%40psu.edu%7Cbc4b7775beae4d2e376c08db86cbfc7b%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638251982928987379%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aCh8X6QtJpRlIWxo%2BQxL85CC%2FbIo6bDxAY%2Fd5B9khmE%3D&reserved=0<https://slurm.schedmd.com/gres.conf.html>
>>  but that
>> didn't really help me.
>>
>>
>> I am rather confused. GRES claims to be generic resources but then it
>> comes with three defined resources (GPU, MPS, MIG) and using one of
>> those didn't work in my case.
>>
>> Obviously, I am misunderstanding something, but I am unsure where to
>> look.
>>
>>
>> Best regards,
>> Xaver Stiensmeier
>>
>

Reply via email to