Thank you, thank you, thank you. It was the firewall on CentOS 7. Once I
disabled that it worked.
For anyone else who runs into this issue here is how to disable the
firewall on CentOS 7:
https://linuxize.com/post/how-to-stop-and-disable-firewalld-on-centos-7/
On Tue, Jan 21, 2020 at 7:24 AM
On 1/21/2020 12:32 AM, Chris Samuel wrote:
On 20/1/20 3:00 pm, Dean Schulze wrote:
There's either a problem with the source code I cloned from github,
or there is a problem when the controller runs on Ubuntu 19 and the
node runs on CentOS 7.7. I'm downgrading to a stable 19.05 build to
see
On 20/1/20 3:00 pm, Dean Schulze wrote:
There's either a problem with the source code I cloned from github, or
there is a problem when the controller runs on Ubuntu 19 and the node
runs on CentOS 7.7. I'm downgrading to a stable 19.05 build to see if
that solves the problem.
I've run the ma
The node is not getting the status from itself, it’s querying the slurmctld to
ask for its status.
--
|| \\UTGERS, |---*O*---
||_// the State | Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0
There's either a problem with the source code I cloned from github, or
there is a problem when the controller runs on Ubuntu 19 and the node runs
on CentOS 7.7. I'm downgrading to a stable 19.05 build to see if that
solves the problem.
On Mon, Jan 20, 2020 at 3:41 PM Carlos Fenoy wrote:
> It se
It seems to me that the problem is between the slurmctld and slurmd. When
slurmd starts it sends a message to the slurmctld, that's why it appears
idle. Every now and then the slurmctld will try to ping the slurmd to check
if it's still alive. This ping doesn't seem to be working, so as I
mentioned
Check the slurmd log file on the node.
Ensure slurmd is still running. Sounds possible that OOM Killer or such
may be killing slurmd
Brian Andrus
On 1/20/2020 1:12 PM, Dean Schulze wrote:
If I restart slurmd the asterisk goes away. Then I can run the job
once and the asterisk is back, and t
If I restart slurmd the asterisk goes away. Then I can run the job once
and the asterisk is back, and the node remains in comp*:
[liqid@liqidos-dean-node1 ~]$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up infinite 1 idle liqidos-dean-node1
[liqid@liqidos-dean-no
If I run sinfo on the node itself it shows an asterisk. How can the node
be unreachable from itself?
On Mon, Jan 20, 2020 at 1:50 PM Carlos Fenoy wrote:
> Hi,
>
> The * next to the idle status in sinfo means that the node is
> unreachable/not responding. Check the status of the slurmd on the no
Hi,
The * next to the idle status in sinfo means that the node is
unreachable/not responding. Check the status of the slurmd on the node and
check the connectivity from the slurmctld host to the compute node (telnet
may be enough). You can also check the slurmctld logs for more information.
Regar
10 matches
Mail list logo