Re: [slurm-users] Munge decode failing on new node

2020-05-14 Thread dean.w.schulze
, April 17, 2020 1:58 PM To: Slurm User Community List Subject: Re: [slurm-users] Munge decode failing on new node A couple of quick checks to see if the problem is munge: 1. On the problem node, try $ echo foo | munge | unmunge 2. If (1) works, try this from the node running slurmctld

Re: [slurm-users] Munge decode failing on new node

2020-04-23 Thread Dean Schulze
I went through the exercise of making the other user the same on the slurmctld as on the slurmd nodes, but that had no effect. I still have 3 nodes that have connectivity and one node where slurmd cannot contact slurmctld. That node has ssh connectivity to and from slurmctld node, but no slurm co

Re: [slurm-users] Munge decode failing on new node

2020-04-22 Thread Gennaro Oliva
Hi Dean, On Wed, Apr 22, 2020 at 07:28:15PM -0600, dean.w.schu...@gmail.com wrote: > Even for users other than slurm and munge? It seems strange that 3 of > 4 worker nodes work with the same UIDs/GIDs as the non-working nodes. As in: https://slurm.schedmd.com/quickstart_admin.html Super Quick

Re: [slurm-users] Munge decode failing on new node

2020-04-22 Thread dean.w.schulze
Subject: Re: [slurm-users] Munge decode failing on new node On 4/22/20 12:56 PM, dean.w.schu...@gmail.com wrote: > There is a third user account on all machines in the cluster that is > the user account for using the cluster. That account has uid 1000 on > all four worker nodes, b

Re: [slurm-users] Munge decode failing on new node

2020-04-22 Thread Christopher Samuel
On 4/22/20 12:56 PM, dean.w.schu...@gmail.com wrote: There is a third user account on all machines in the cluster that is the user account for using the cluster. That account has uid 1000 on all four worker nodes, but on the controller it is 1001. So that is probably why the question marks.

Re: [slurm-users] Munge decode failing on new node

2020-04-22 Thread dean.w.schulze
work have the same uid mismatch for that user (nor the slurm or munge user). -Original Message- From: slurm-users On Behalf Of Chris Samuel Sent: Monday, April 20, 2020 12:03 AM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] Munge decode failing on new node On Friday, 17 April

Re: [slurm-users] Munge decode failing on new node

2020-04-22 Thread dean.w.schulze
this one is going nowhere. From: slurm-users On Behalf Of Brian Andrus Sent: Sunday, April 19, 2020 9:30 AM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] Munge decode failing on new node I see potentially 2 things you should likely do: 1. Run ntpd on your nodes. You can

Re: [slurm-users] Munge decode failing on new node

2020-04-19 Thread Chris Samuel
On Friday, 17 April 2020 2:22:00 PM PDT Dean Schulze wrote: > Both work. The only discrepancy is that the slurm controller output had > these two lines: > > UID: ??? (1000) > GID: ??? (1000) > > Like the controller doesn't know the username for UID 1000. What does thi

Re: [slurm-users] Munge decode failing on new node

2020-04-19 Thread Brian Andrus
, 2020 3:40 PM *To:* Slurm User Community List mailto:slurm-users@lists.schedmd.com>> *Subject:* Re: [slurm-users] Munge decode failing on new node There is no ntp service running on any of my nodes, and all but this one is working.  I haven't heard that ntp is a require

Re: [slurm-users] Munge decode failing on new node

2020-04-17 Thread Dean Schulze
2020 3:40 PM > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] Munge decode failing on new node > > > > There is no ntp service running on any of my nodes, and all but this one > is working. I haven't heard that ntp is a requirement for slurm, just that >

Re: [slurm-users] Munge decode failing on new node

2020-04-17 Thread Dean Schulze
e > > > > *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On > Behalf Of *Dean Schulze > *Sent:* Friday, April 17, 2020 3:40 PM > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] Munge decode failing on new node > > > > There is no ntp serv

Re: [slurm-users] Munge decode failing on new node

2020-04-17 Thread Riebs, Andy
...@lists.schedmd.com] On Behalf Of Dean Schulze Sent: Friday, April 17, 2020 3:40 PM To: Slurm User Community List Subject: Re: [slurm-users] Munge decode failing on new node There is no ntp service running on any of my nodes, and all but this one is working. I haven't heard that ntp

Re: [slurm-users] Munge decode failing on new node

2020-04-17 Thread Dean Schulze
There is no ntp service running on any of my nodes, and all but this one is working. I haven't heard that ntp is a requirement for slurm, just that the time be synchronized across the cluster. And it is. On Wed, Apr 15, 2020 at 12:17 PM Carlos Fenoy wrote: > I’d check ntp as your encoding time

Re: [slurm-users] Munge decode failing on new node

2020-04-15 Thread Chris Samuel
On 4/15/20 10:57 am, Dean Schulze wrote:  error: Munge decode failed: Invalid credential  ENCODED: Wed Dec 31 17:00:00 1969  DECODED: Wed Dec 31 17:00:00 1969  error: authentication: Invalid authentication credential That's really interesting, I had one of these last week when on call, fo

Re: [slurm-users] Munge decode failing on new node

2020-04-15 Thread Ole Holm Nielsen
You might want to check the Munge section in my Slurm Wiki page: https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#munge-authentication-service /Ole On 15-04-2020 19:57, Dean Schulze wrote: I've installed two new nodes onto my slurm cluster.  One node works, but the other one complains abou

Re: [slurm-users] Munge decode failing on new node

2020-04-15 Thread Dean Schulze
> *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On > Behalf Of *Dean Schulze > *Sent:* Wednesday, April 15, 2020 1:57 PM > *To:* Slurm User Community List > *Subject:* [slurm-users] Munge decode failing on new node > > > > I've installed two n

Re: [slurm-users] Munge decode failing on new node

2020-04-15 Thread Carlos Fenoy
I’d check ntp as your encoding time seems odd to me On Wed, 15 Apr 2020 at 19:59, Dean Schulze wrote: > I've installed two new nodes onto my slurm cluster. One node works, but > the other one complains about an invalid credential for munge. I've > verified that the munge.key is the same as on

Re: [slurm-users] Munge decode failing on new node

2020-04-15 Thread Riebs, Andy
List Subject: [slurm-users] Munge decode failing on new node I've installed two new nodes onto my slurm cluster. One node works, but the other one complains about an invalid credential for munge. I've verified that the munge.key is the same as on all other nodes with sudo cksum

[slurm-users] Munge decode failing on new node

2020-04-15 Thread Dean Schulze
I've installed two new nodes onto my slurm cluster. One node works, but the other one complains about an invalid credential for munge. I've verified that the munge.key is the same as on all other nodes with sudo cksum /etc/munge/munge.key I recopied a munge.key from a node that works. I've ver