Steve, you've exhausted my best ideas... hope someone else can jump in! Andy
On Fri, Nov 27, 2020, 11:19 AM Steve Bland <sbl...@rossvideo.com> wrote: > > Andy > > I appreciate you making me check again, things do get missed > > SELinux is off, firewalld is disabled > > [root@SRVGRIDSLURM01 ~]# sestatus > > SELinux status: disabled > > [root@SRVGRIDSLURM01 ~]# systemctl status firewalld > > ● firewalld.service - firewalld - dynamic firewall daemon > > Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; > vendor preset: enabled) > > Active: inactive (dead) > > Docs: man:firewalld(1) > > The one thing I can think of is that the system running slurmctld has two > network interfaces. It serves as a gateway, so has two network address. The > two of the test slurmd's are on the other side of that gateway box, one is > on the same box. But the two on the other side of the gateway, have a > different IP address range and possibly mask > > this is from slurm.conf for the three nodes. I know they are talking; I > can see it in the logs when set to a debug logging level > the nodename info comes from slurmd -C, so that is correct > added the IP address, but that did not matter > > # COMPUTE NODES > > NodeName=SRVGRIDSLURM01 NodeAddr=192.168.1.60 CPUs=4 Boards=1 > SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=1 RealMemory=7821 > > NodeName=SRVGRIDSLURM02 NodeAddr=192.168.1.61 CPUs=4 Boards=1 > SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=1 RealMemory=7821 > > NodeName=srvgridslurm03 NodeAddr=192.168.1.62 CPUs=4 Boards=1 > SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=1 RealMemory=7821 > > PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP > > about the only thing I can think of is to make one of the nodes on the > otherside of the gateway into the control node > > > *Steve Bland* > *Technical Product Manager* > > *Third Party Products* > Ross Video | Production Technology Experts > T: +1 (613) 228-0688 ext.4219 > www.rossvideo.com > ------------------------------ > *From:* Andy Riebs <andy.ri...@gmail.com> on behalf of Andy Riebs < > a...@candooz.com> > *Sent:* 26 November 2020 13:40 > *To:* Steve Bland <sbl...@rossvideo.com>; Slurm User Community List < > slurm-users@lists.schedmd.com> > *Subject:* Re: [EXTERNAL] Re: [slurm-users] trying to diagnose a > connectivity issue between the slurmctld process and the slurmd nodes > > > One last shot on the firewall front Steve -- does the control node have a > firewall enabled? I've seen cases where that can cause the sporadic > messaging failures that you seem to be seeing. > > That failing, I'll defer to anyone with different ideas! > > Andy > On 11/26/2020 1:01 PM, Steve Bland wrote: > > ---------------------------------------------- > > This e-mail and any attachments may contain information that is > confidential to Ross Video. > > If you are not the intended recipient, please notify me immediately by > replying to this message. Please also delete all copies. Thank you. >