Hi,
I submit 2 or more jobs to test the scheduler and how it is handling multiple
job requests and overall info. Now I submit two or more jobs I get the
following odd output:
[vitorio@dev35 verif_dev]$ squeue
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
Hello all!
I have an odd question.
On my headnode, also my login nodes, I can ping my outside network and inside
network using any DNS name I choose. It is dual homed. When I run squeue,
sinfo, or any slurm command as my regular account I get the below error. I can
run srun, squeue, sinfo, or
Steven,
Looks like you may have had a secondary controller that took over and
changed your StateSave files.
IF you don't need the job info AND no jobs are running, you can just
rename/delete your StateSaveLocation directory and things will be
recreated. Job numbers will start over (unless y
Steven, one tip if you are just starting with Slurm: "Use the logs Luke,
Use the logs"
By this I mean tail -f /var/log/slurmctl and restart the slurmctld
service
On a compute node tail -f /var/log/slurmd
Oh, and you probably are going to set up Munge also - which is easy.
On Tue, 4 Feb 2025