On 2/2/25 4:18 pm, Steven Jones via slurm-users wrote:
isn't it slurmd on the compute nodes?
It is, but as this check is (I think) happening on the compute node I
was wanting to check who slurmctld was running as.
The only other thought I have is what is in the compute nodes slurm.conf
as
slurm server,
[root@xxxunidrslurmd2 slurm]# scontrol show config | fgrep -i slurmuser
SlurmUser = slurm(12002)
[root@xxxunidrslurmd2 slurm]# id slurm
uid=12002(slurm) gid=12002(slurm) groups=12002(slurm)
[root@xxxunidrslurmd2 slurm]#
[root@xxxunidrslurmd2 slurm]# ps auxwww | fgrep
On 2/2/25 3:46 pm, Steven Jones wrote:
I have never done a HPC before, it is all new to me so I can be making
"newbie errors". The old HPC has been dumped on us so I am trying to
build it "professionally" shall we say ie documented, stable and I will
train ppl to build it (all this with no
Hi,
I have never done a HPC before, it is all new to me so I can be making "newbie
errors". The old HPC has been dumped on us so I am trying to build it
"professionally" shall we say ie documented, stable and I will train ppl to
build it (all this with no money at all).
My understanding is
On 2/2/25 2:46 pm, Steven Jones via slurm-users wrote:
[2025-01-30T19:45:29.024] error: Security violation, ping RPC from uid 12002
Looking at the code that seems to come from this code:
if (!_slurm_authorized_user(msg->auth_uid)) {
error("Security violation, batch lau
Hi,
2025-01-29T00:33:32.123] CPU frequency setting not configured for this node
[2025-01-29T00:33:32.124] slurmd version 20.11.9 started
[2025-01-29T00:33:32.125] slurmd started on Wed, 29 Jan 2025 00:33:32 +
[2025-01-29T00:33:32.125] CPUs=20 Boards=1 Sockets=20 Cores=1 Threads=1
Memory=48269
On 2/2/25 1:54 pm, Steven Jones via slurm-users wrote:
Thanks for the reply. I already went through this 🙁. I checked all
nodes, id works as does a ssh login.
What is in your slurmd logs on that node?
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email
Hi,
Thanks for the reply. I already went through this 🙁. I checked all nodes, id
works as does a ssh login.
[root@node4 ~]# id xxxjone...@xxx.ac.nz
uid=1204805830(xxxjone...@xxx.ac.nz) gid=1204805830(xxxjone...@xxx.ac.nz)
8><---
Connection to node1 closed.
[root@xxxunicobuildt1 warewulf]# ssh
On 29/1/25 10:44 am, Steven Jones via slurm-users wrote:
"2025-01-28T21:48:50.271] sched: Allocate JobId=4 NodeList=node4 #CPUs=1
Partition=debug
[2025-01-28T21:48:50.280] Killing non-startable batch JobId=4: Invalid
user id"
Looking at the source code it looks like that second error is repor