[slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

2025-02-02 Thread Chris Samuel via slurm-users
On 2/2/25 4:18 pm, Steven Jones via slurm-users wrote: isn't it slurmd on the compute nodes? It is, but as this check is (I think) happening on the compute node I was wanting to check who slurmctld was running as. The only other thought I have is what is in the compute nodes slurm.conf as

[slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

2025-02-02 Thread Steven Jones via slurm-users
slurm server, [root@xxxunidrslurmd2 slurm]# scontrol show config | fgrep -i slurmuser SlurmUser = slurm(12002) [root@xxxunidrslurmd2 slurm]# id slurm uid=12002(slurm) gid=12002(slurm) groups=12002(slurm) [root@xxxunidrslurmd2 slurm]# [root@xxxunidrslurmd2 slurm]# ps auxwww | fgrep

[slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

2025-02-02 Thread Chris Samuel via slurm-users
On 2/2/25 3:46 pm, Steven Jones wrote: I have never done a HPC before, it is all new to me so I can be making "newbie errors".   The old HPC has been dumped on us so I am trying to build it "professionally" shall we say  ie documented, stable and I will train ppl to build it  (all this with no

[slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

2025-02-02 Thread Steven Jones via slurm-users
Hi, I have never done a HPC before, it is all new to me so I can be making "newbie errors". The old HPC has been dumped on us so I am trying to build it "professionally" shall we say ie documented, stable and I will train ppl to build it (all this with no money at all). My understanding is

[slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

2025-02-02 Thread Chris Samuel via slurm-users
On 2/2/25 2:46 pm, Steven Jones via slurm-users wrote: [2025-01-30T19:45:29.024] error: Security violation, ping RPC from uid 12002 Looking at the code that seems to come from this code: if (!_slurm_authorized_user(msg->auth_uid)) { error("Security violation, batch lau

[slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

2025-02-02 Thread Steven Jones via slurm-users
Hi, 2025-01-29T00:33:32.123] CPU frequency setting not configured for this node [2025-01-29T00:33:32.124] slurmd version 20.11.9 started [2025-01-29T00:33:32.125] slurmd started on Wed, 29 Jan 2025 00:33:32 + [2025-01-29T00:33:32.125] CPUs=20 Boards=1 Sockets=20 Cores=1 Threads=1 Memory=48269

[slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

2025-02-02 Thread Chris Samuel via slurm-users
On 2/2/25 1:54 pm, Steven Jones via slurm-users wrote: Thanks for the reply.  I already went through this 🙁.  I checked all nodes, id works as does a ssh login. What is in your slurmd logs on that node? -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email

[slurm-users] Fw: Re: RHEL8.10 V slurmctld

2025-02-02 Thread Steven Jones via slurm-users
Hi, Thanks for the reply. I already went through this 🙁. I checked all nodes, id works as does a ssh login. [root@node4 ~]# id xxxjone...@xxx.ac.nz uid=1204805830(xxxjone...@xxx.ac.nz) gid=1204805830(xxxjone...@xxx.ac.nz) 8><--- Connection to node1 closed. [root@xxxunicobuildt1 warewulf]# ssh

[slurm-users] Re: RHEL8.10 V slurmctld

2025-02-02 Thread Chris Samuel via slurm-users
On 29/1/25 10:44 am, Steven Jones via slurm-users wrote: "2025-01-28T21:48:50.271] sched: Allocate JobId=4 NodeList=node4 #CPUs=1 Partition=debug [2025-01-28T21:48:50.280] Killing non-startable batch JobId=4: Invalid user id" Looking at the source code it looks like that second error is repor