Thank you, thank you, thank you. It was the firewall on CentOS 7. Once I disabled that it worked.
For anyone else who runs into this issue here is how to disable the firewall on CentOS 7: https://linuxize.com/post/how-to-stop-and-disable-firewalld-on-centos-7/ On Tue, Jan 21, 2020 at 7:24 AM Brian Johanson <bjoha...@psc.edu> wrote: > > On 1/21/2020 12:32 AM, Chris Samuel wrote: > > On 20/1/20 3:00 pm, Dean Schulze wrote: > > > >> There's either a problem with the source code I cloned from github, > >> or there is a problem when the controller runs on Ubuntu 19 and the > >> node runs on CentOS 7.7. I'm downgrading to a stable 19.05 build to > >> see if that solves the problem. > > > > I've run the master branch on a Cray XC without issues, and I concur > > with what the others have said and suggest it's worth checking the > > slurmd and slurmctld logs to find out why communications is not right > > between them. > > > and if the logs do not have enough information, run the daemon in the > foreground with increased verbosity > > slurmd -D -v -v -v > > As another said, check if the connections are available with telnet > server->client 'telnet node1 6818' (6818 is the default slurmd port) and > same from compute->server. > > Are these new host builds? Is there a firewall enabled? Kinda sounds > like a firewall on the client that allows outbound (initial connection > to the slurmctl) but not new inbound (slurmctl ping) connections. > > -b > > >