Steve, you've exhausted my best ideas... hope someone else can jump in!

Andy

On Fri, Nov 27, 2020, 11:19 AM Steve Bland <sbl...@rossvideo.com> wrote:

>
> Andy
>
> I appreciate you making me check again, things do get missed
>
> SELinux is off, firewalld is disabled
>
> [root@SRVGRIDSLURM01 ~]# sestatus
>
> SELinux status:                 disabled
>
> [root@SRVGRIDSLURM01 ~]# systemctl status firewalld
>
> ● firewalld.service - firewalld - dynamic firewall daemon
>
>    Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled;
> vendor preset: enabled)
>
>    Active: inactive (dead)
>
>      Docs: man:firewalld(1)
>
> The one thing I can think of is that the system running  slurmctld has two
> network interfaces. It serves as a gateway, so has two network address. The
> two of the test slurmd's are on the other side of that gateway box, one is
> on the same box. But the two on the other side of the gateway, have a
> different IP address range and possibly mask
>
> this is from slurm.conf for the three nodes. I know they are talking; I
> can see it in the logs when set to a debug logging level
> the nodename info comes from slurmd -C, so that is correct
> added the IP address, but that did not matter
>
> # COMPUTE NODES
>
> NodeName=SRVGRIDSLURM01 NodeAddr=192.168.1.60 CPUs=4 Boards=1
> SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=1 RealMemory=7821
>
> NodeName=SRVGRIDSLURM02 NodeAddr=192.168.1.61 CPUs=4 Boards=1
> SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=1 RealMemory=7821
>
> NodeName=srvgridslurm03 NodeAddr=192.168.1.62 CPUs=4 Boards=1
> SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=1 RealMemory=7821
>
> PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP
>
> about the only thing I can think of is to make one of the nodes on the
> otherside of the gateway into the control node
>
>
> *Steve Bland*
> *Technical Product Manager*
>
> *Third Party Products*
> Ross Video | Production Technology Experts
> T: +1 (613) 228-0688 ext.4219
> www.rossvideo.com
> ------------------------------
> *From:* Andy Riebs <andy.ri...@gmail.com> on behalf of Andy Riebs <
> a...@candooz.com>
> *Sent:* 26 November 2020 13:40
> *To:* Steve Bland <sbl...@rossvideo.com>; Slurm User Community List <
> slurm-users@lists.schedmd.com>
> *Subject:* Re: [EXTERNAL] Re: [slurm-users] trying to diagnose a
> connectivity issue between the slurmctld process and the slurmd nodes
>
>
> One last shot on the firewall front Steve -- does the control node have a
> firewall enabled? I've seen cases where that can cause the sporadic
> messaging failures that you seem to be seeing.
>
> That failing, I'll defer to anyone with different ideas!
>
> Andy
> On 11/26/2020 1:01 PM, Steve Bland wrote:
>
> ----------------------------------------------
>
> This e-mail and any attachments may contain information that is
> confidential to Ross Video.
>
> If you are not the intended recipient, please notify me immediately by
> replying to this message. Please also delete all copies. Thank you.
>

Reply via email to