[slurm-users] pam_slurm_adopt working on only some nodes

2022-01-28 Thread Wayne Hendricks
Any idea why pam_slurm_adopt would work on some nodes but not others? Here is an excerpt from one of the nodes: Jan 28 15:38:54 dgx1-1 sshd[1027640]: pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.10.10.1 user=test.user Jan 28 15:38:54 dgx1-1 pam_slurm_

slurm-users@lists.schedmd.com

2022-01-15 Thread Wayne Hendricks
configurations. On Sat, Jan 15, 2022 at 10:32 AM Wayne Hendricks wrote: > > The only thing that jumps out on the ctl logs is: > error: step_layout_create: no usable CPUs > The node logs were unremarkable. > > It doesn't make much sense to me that the same job with srun or an

slurm-users@lists.schedmd.com

2022-01-15 Thread Wayne Hendricks
at, Jan 15, 2022 at 12:56 AM Sean Crosby wrote: > > Any error in slurmd.log on the node or slurmctld.log on the ctl? > > Sean > > From: slurm-users on behalf of Wayne > Hendricks > Sent: Saturday, 15 January 2022 16:04 > To: slurm-us...

slurm-users@lists.schedmd.com

2022-01-14 Thread Wayne Hendricks
Running test job with srun works: wayneh@login:~$ srun -G16 -p v100 /home/wayne.hendricks/job.sh 179851 Linux dgx1-1 5.4.0-94-generic #106-Ubuntu SMP Thu Jan 6 23:58:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux 179851 Linux dgx1-2 5.4.0-94-generic #106-Ubuntu SMP Thu Jan 6 23:58:14 UTC 2022 x86_64 x8

[slurm-users] Building with latest pmix-4.0.0 error

2022-01-04 Thread Wayne Hendricks
./configure --prefix=/admin/slurm/slurm-21.08.5 --with-pmix=/admin/slurm/pmix-4.0.0 onfigure: WARNING: unable to locate pmix installation configure: error: unable to locate pmix installation configure:17261: checking for pmix installation configure:17299: gcc -o conftest -DNUMA_VERSION1_COMPATIB

[slurm-users] DefMemPerGPU bug?

2020-03-26 Thread Wayne Hendricks
When using 20.02/cons_tres and defining DefMemPerGPU, jobs submitted that request GPUs without defining “—mem” will not run more than one job per node. I can see where it is allocating the correct amount of memory for the job per GPUs requested, but no other jobs will run on the node. If a value