Hello,
Recently we notice a strange delay from job-submitting to job-start while the partition is sure to have enough idle nodes to meet the job's demand. To avoid interference, we use the 4-node debug partition for test, which does not have any other job to run. And the test job script is also as simple as possible: #!/bin/bash #SBATCH --job-name=test #SBATCH --partition=debug #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=1 #SBATCH --output=%j.out #SBATCH --error=%j.err hostname sleep 1000 echo end But after submit, this job still stay at PENDING state for about 30-60s and during the pending time sacct shows the REASON is "None". We have also checked the slurmctld.log at server and slurmd.log at client node with debug log level. Both of them have nothing useful to figure out the pending reason. So is there any way to make slurm explain in detail why the job didn't start immediately or what it was doing during the job pending time? Thanks.