The problem appears to be using AutoDetect=nvml in the gres.conf file. When we
remove that and fully specify everything (with help from the
https://gitlab.com/nvidia/hpc/slurm-mig-discovery tool) then I am able to
submit jobs allocating all of the MIG gpus at once, or submit X jobs asking for
No, I can't submit more than 7 individual jobs and have them all run, the jobs
after the first 7 will go to pending until the first 7 finish.
And it's not a limit (at least, not of "7"), because here's the same problem
but with a node configured for 2x3g.20gb per card (2 cards, so, 4 total MIG
How did you configure Slurm? Check this in your slurm.conf file:
$ scontrol show config | grep StateSaveLocation
StateSaveLocation = /var/spool/slurmctld
You seem to have defined this incorrectly: /var/spool/slurm/ctld
/Ole
On 16-11-2022 23:13, 김종록 wrote:
When I started
slurmctld for
Can you request more than 7 single gpu jobs on the same node?
It could be that there's another limit you've encountered (e.g. memory or
cpu), or some other limit (in the account, partition, or qos)
On our setup we're limiting jobs to 1 gpu per job (via partition qos),
however we can use up all the
Hi Nicolas and Urban,
Thank you for your replies!
Kind regards,
Hans
From: slurm-users on behalf of Urban
Borštnik
Sent: 16 November 2022 16:45
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] SLURM in K8s, any advice?
Hi Hans,
We run Sl
Thank you. The problem is solved.
Hi,
we had a similar issue in the past. We had to set config options AuthAltTypes
and AuthAltParameters also in slurmdbd.conf, e.g.
AuthAltTypes=auth/jwt
AuthAltParameters=jwt_key=/opt/slurm/etc/jwt_hs256.key
That did the trick for db API queries.
Cheers,
Georgios
From: slurm-users o