Re: [slurm-users] Munge decode failing on new node

2020-04-15 Thread Chris Samuel
On 4/15/20 10:57 am, Dean Schulze wrote:  error: Munge decode failed: Invalid credential  ENCODED: Wed Dec 31 17:00:00 1969  DECODED: Wed Dec 31 17:00:00 1969  error: authentication: Invalid authentication credential That's really interesting, I had one of these last week when on call, fo

Re: [slurm-users] Munge decode failing on new node

2020-04-15 Thread Ole Holm Nielsen
You might want to check the Munge section in my Slurm Wiki page: https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#munge-authentication-service /Ole On 15-04-2020 19:57, Dean Schulze wrote: I've installed two new nodes onto my slurm cluster.  One node works, but the other one complains abou

Re: [slurm-users] How to request for the allocation of scratch .

2020-04-15 Thread navin srivastava
Thanks Erik. Last night i made the changes. i defined in slurm.conf on all the nodes as well as on the slurm server. TmpFS=/lscratch NodeName=node[01-10] CPUs=44 RealMemory=257380 Sockets=2 CoresPerSocket=22 ThreadsPerCore=1 TmpDisk=160 State=UNKNOWN Feature=P4000 Gres=gpu:2 These nodes

Re: [slurm-users] [EXTERNAL] Re: Munge decode failing on new node

2020-04-15 Thread Sean Crosby
Who owns the munge directory and key? Is it the right uid/gid? Is the munge daemon running? -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria 3010 Australia On Thu, 16 Apr 2020 at 04:57, Dean Schulz

[slurm-users] Need help configuring 3 tier priority/multifactor preemption in cluster

2020-04-15 Thread Joshua Sonstroem
Hi Slurm-Users, Hope this post finds all of you healthy and safe amidst the ongoing COVID19 craziness. We've got a strange error state that occurs when we enable preemption and we need help diagnosing what is wrong. I'm not sure if we are missing a default value or other necessary configuration, b

Re: [slurm-users] Munge decode failing on new node

2020-04-15 Thread Dean Schulze
/etc/munge is 700 /etc/munge/munge.key is 400 On Wed, Apr 15, 2020 at 12:11 PM Riebs, Andy wrote: > Two trivial things to check: > > 1. Permissions on /etc/munge and /etc/munge.key > > 2. Is munged running on the problem node? > > > > Andy > > > > *From:* slurm-users [mailto:slurm-

Re: [slurm-users] How to request for the allocation of scratch .

2020-04-15 Thread Ellestad, Erik
The default value for TmpDisk is 0, so if you want local scratch available on a node, the amount of TmpDisk space must be defined in the node configuration in slurm.conf. example: NodeName=TestNode01 CPUs=8 Boards=1 SocketsPerBoard=2 CoresPerSocket=4 ThreadsPerCore=1 RealMemory=24099 TmpDisk=1

Re: [slurm-users] Munge decode failing on new node

2020-04-15 Thread Carlos Fenoy
I’d check ntp as your encoding time seems odd to me On Wed, 15 Apr 2020 at 19:59, Dean Schulze wrote: > I've installed two new nodes onto my slurm cluster. One node works, but > the other one complains about an invalid credential for munge. I've > verified that the munge.key is the same as on

Re: [slurm-users] Munge decode failing on new node

2020-04-15 Thread Riebs, Andy
Two trivial things to check: 1. Permissions on /etc/munge and /etc/munge.key 2. Is munged running on the problem node? Andy From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of Dean Schulze Sent: Wednesday, April 15, 2020 1:57 PM To: Slurm User Community Li

[slurm-users] Munge decode failing on new node

2020-04-15 Thread Dean Schulze
I've installed two new nodes onto my slurm cluster. One node works, but the other one complains about an invalid credential for munge. I've verified that the munge.key is the same as on all other nodes with sudo cksum /etc/munge/munge.key I recopied a munge.key from a node that works. I've ver

Re: [slurm-users] [External] Another question about partition and node allocation

2020-04-15 Thread Michael Robbert
The more flexible way to do this is with QoS. (PreemptType=preempt/qos) You'll need to have Accounting enabled and you'll probably want qos listed in AccountingStorageEnforce. Once you do that you create a "shared" for the scavenger jobs, a QoS for each group that buys into resources. Assign the

[slurm-users] Reduce trigger execution time

2020-04-15 Thread Nicolò Parmiggiani
Dear all, the Slurm official documentation say that: Trigger events are not processed instantly, but a check is performed for trigger events on a periodic basis (currently every 15 seconds). https://slurm.schedmd.com/strigger.html Is it possible to reduce this time? Thank You.