On 1/21/2020 12:32 AM, Chris Samuel wrote:
On 20/1/20 3:00 pm, Dean Schulze wrote:

There's either a problem with the source code I cloned from github, or there is a problem when the controller runs on Ubuntu 19 and the node runs on CentOS 7.7. I'm downgrading to a stable 19.05 build to see if that solves the problem.

I've run the master branch on a Cray XC without issues, and I concur with what the others have said and suggest it's worth checking the slurmd and slurmctld logs to find out why communications is not right between them.

and if the logs do not have enough information, run the daemon in the foreground with increased verbosity

slurmd -D -v -v -v

As another said, check if the connections are available with telnet  server->client 'telnet node1 6818' (6818 is the default slurmd port) and same from compute->server.

Are these new host builds?  Is there a firewall enabled?  Kinda sounds like a firewall on the client that allows outbound (initial connection to the slurmctl) but not new inbound (slurmctl ping) connections.

-b


Reply via email to