Re: [slurm-users] NVIDIA MIG question

2022-11-17 Thread Groner, Rob
The problem appears to be using AutoDetect=nvml in the gres.conf file. When we remove that and fully specify everything (with help from the https://gitlab.com/nvidia/hpc/slurm-mig-discovery tool) then I am able to submit jobs allocating all of the MIG gpus at once, or submit X jobs asking for

Re: [slurm-users] NVIDIA MIG question

2022-11-17 Thread Groner, Rob
No, I can't submit more than 7 individual jobs and have them all run, the jobs after the first 7 will go to pending until the first 7 finish. And it's not a limit (at least, not of "7"), because here's the same problem but with a node configured for 2x3g.20gb per card (2 cards, so, 4 total MIG

Re: [slurm-users] Ignore state file recover error at first starting of slurmctld

2022-11-17 Thread Ole Holm Nielsen
How did you configure Slurm? Check this in your slurm.conf file: $ scontrol show config | grep StateSaveLocation StateSaveLocation = /var/spool/slurmctld You seem to have defined this incorrectly: /var/spool/slurm/ctld /Ole On 16-11-2022 23:13, 김종록 wrote: When I started slurmctld for

Re: [slurm-users] NVIDIA MIG question

2022-11-17 Thread Yair Yarom
Can you request more than 7 single gpu jobs on the same node? It could be that there's another limit you've encountered (e.g. memory or cpu), or some other limit (in the account, partition, or qos) On our setup we're limiting jobs to 1 gpu per job (via partition qos), however we can use up all the

Re: [slurm-users] SLURM in K8s, any advice?

2022-11-17 Thread Viessmann Hans-Nikolai (PSI)
Hi Nicolas and Urban, Thank you for your replies! Kind regards, Hans From: slurm-users on behalf of Urban Borštnik Sent: 16 November 2022 16:45 To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] SLURM in K8s, any advice? Hi Hans, We run Sl

Re: [slurm-users] [ext] REST API Error Slurmdb

2022-11-17 Thread 김종록
 Thank you. The problem is solved.  

Re: [slurm-users] [ext] REST API Error Slurmdb

2022-11-17 Thread Nikolis, Georgios
Hi, we had a similar issue in the past. We had to set config options AuthAltTypes and AuthAltParameters also in slurmdbd.conf, e.g. AuthAltTypes=auth/jwt AuthAltParameters=jwt_key=/opt/slurm/etc/jwt_hs256.key That did the trick for db API queries. Cheers, Georgios From: slurm-users o