Hi Joakim,
one more thing to mention:
Am 11.05.2020 um 19:23 schrieb Joakim Hove:
ubuntu@ip-172-31-80-232:/var/run/slurm-llnl$ scontrol show node
NodeName=ip-172-31-80-232 Arch=x86_64 CoresPerSocket=1
Reason=Low RealMemory [root@2020-05-11T16:20:02]
The "State=IDLE+DRAIN" looks a bit susp
ubuntu@ip-172-31-80-232:/var/run/slurm-llnl$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up infinite 1 drain ip-172-31-80-232
● slurmd.service - Slurm node daemon
Loaded: loaded (/lib/systemd/system/slurmd.service; enabled; vendor
preset: enabled)
Active: ac
ubuntu@ip-172-31-80-232:/var/run/slurm-llnl$ scontrol show node
NodeName=ip-172-31-80-232 Arch=x86_64 CoresPerSocket=1
CPUAlloc=0 CPUErr=0 CPUTot=1 CPULoad=0.00
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=(null)
NodeAddr=ip-172-31-80-232 NodeHostName=ip-172-31-80-232 Version=
You will want to look at the output of 'sinfo' and 'scontrol show node' to
see what slurmctld thinks about your compute nodes; then on the compute
nodes you will want to check the status of the slurmd service ('systemctl
status -l slurmd') and possibly read through the slurmd logs as well.
On Mon,
Hello;
I am in the process of familiarizing myself with slurm - I will write a
piece of software which will submit jobs to a slurm cluster. Right now I
have just made my own "cluster" consisting of one Amazon AWS node and use
that to familiarize myself with the sxxx commands - has worked nicely.