You showed that firewalld is off, but that doesn't really prove on Centos7/RHEL7 that there is no firewall.
What is the output of iptables -S I'd also try doing # scontrol show config | grep -i SlurmdPort SlurmdPort = 6818 And whatever port is shown, from the compute nodes, try communicating with the other Slurmd's e.g. from SRVGRIDSLURM01 do nc -z SRVGRIDSLURM02 6818 || echo Cannot communicate nc -z srvgridslurm03 6818 || echo Cannot communicate Replace 6818 with the port you get from the scontrol show config command earlier Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria 3010 Australia On Tue, 1 Dec 2020 at 02:37, Steve Bland <sbl...@rossvideo.com> wrote: > * UoM notice: External email. Be cautious of links, attachments, or > impersonation attempts * > ------------------------------ > > Although, in testing, even with ReturnToService set to ‘1’, on a restart > the system sees the node has come back in the logs, but it is still > classified as down so will not take jobs until manually told otherwise > > > > > > [2020-11-30T10:33:05.402] debug2: node_did_resp SRVGRIDSLURM01 > > [2020-11-30T10:33:05.402] debug2: node_did_resp srvgridslurm03 > > [2020-11-30T10:33:05.402] debug2: node_did_resp SRVGRIDSLURM02 > > > > There has to be a way around this manual intervention > > > > thanks > > > > *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> *On Behalf Of > *Steve Bland > *Sent:* Monday, November 30, 2020 08:12 > *To:* slurm-users@lists.schedmd.com > *Subject:* Re: [slurm-users] [EXTERNAL] Re: trying to diagnose a > connectivity issue between the slurmctld process and the slurmd nodes > > > > Thanks Chris > > > > When I did that, they all came back. > > > > Also found that in slurm.conf*, *ReturnToService was set to 0, so > modified that for now. May turn it back to 0 to see if any nodes are lost, > but I assume that will be in the log > > > > Interestingly I had this in slurm.conf, thought that would make the > initial state up for all > > > > PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP > > > > > > *Steve Bland* > *Technical Product Manager* > > *Third Party Products* > Ross Video | Production Technology Experts > T: +1 (613) 228-0688 ext.4219 > www.rossvideo.com > <https://can01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.rossvideo.com%2F&data=04%7C01%7Csbland%40rossvideo.com%7Cb8ed1faa8a834674670308d89531f492%7C5d1f9dedbb98418c9ad2e1d24a9152a1%7C1%7C0%7C637423389078612061%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=BZowNlheVAOKYa7cpTFi6VJx5Gf6iJ2T9n5Ug4kjxwk%3D&reserved=0> > ------------------------------ > > *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of > Chris Samuel <ch...@csamuel.org> > *Sent:* 27 November 2020 15:02 > *To:* slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com> > *Subject:* [EXTERNAL] Re: [slurm-users] trying to diagnose a connectivity > issue between the slurmctld process and the slurmd nodes > > > > On 26/11/20 9:21 am, Steve Bland wrote: > > > Sinfo always returns nodes not responding > > One thing - do the nodes return to this state when you resume them with > "scontrol update node=srvgridslurm[01-03] state=resume" ? > > If they do then what does your slurmctld logs say for the reason for this? > > You can bump up the log level on your slurmctld with (for instance > "scontrol setdebug debug" for more info (we run ours at debug all the > time anyway). > > All the best, > Chris > -- > Chris Samuel : > https://can01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F&data=04%7C01%7Csbland%40rossvideo.com%7Cd08447ff5072423ef86f08d8930fa82d%7C5d1f9dedbb98418c9ad2e1d24a9152a1%7C1%7C1%7C637421042744008756%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=x5GjoV0mij7cMOciZv7w3wBH%2FEGONoV3i0fUDqoeRlI%3D&reserved=0 > <https://can01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F&data=04%7C01%7Csbland%40rossvideo.com%7Cb8ed1faa8a834674670308d89531f492%7C5d1f9dedbb98418c9ad2e1d24a9152a1%7C1%7C0%7C637423389078622059%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=QPAEm%2FzaZg%2FNKzwzRI4EqHRVHv%2FtQ3V3M4DwK%2B2R5Ck%3D&reserved=0> > : Berkeley, CA, USA > > ---------------------------------------------- > > This e-mail and any attachments may contain information that is > confidential to Ross Video. > > If you are not the intended recipient, please notify me immediately by > replying to this message. Please also delete all copies. Thank you. > ---------------------------------------------- > > This e-mail and any attachments may contain information that is > confidential to Ross Video. > > If you are not the intended recipient, please notify me immediately by > replying to this message. Please also delete all copies. Thank you. >