On 3/23/19 2:16 PM, Sharma, M D wrote:
Hi folks,
By default slurm allocates the whole node for a job (even if it
specifically requested a single core). This is usually taken care of
by adding SelectType=select/cons_res along with an appropriate
parameter such as SelectTypeParameters=CR_Core_Memory.
When testing the job submission and resource allocation, we can see
things work as intended when using srun:
srun -N1 -n1 -p fxq --mem=1000 sleep 60 &
# A command as above, submitted 20 times would launch 20 jobs on a
single 40 core node as intended.
However, if the same request is submitted via sbatch, the entire node
gets into an "allocated" state and does not accept any other jobs
until completion of the single core job.
Has anyone else seen this behaviour / have thoughts on a fix?
I believe by requesting -N1, you are requesting the entire node be
allocation to that job. Remove -N1 from that command, like this:
srun -n1 -p fxq --mem=1000 sleep 60 &
That should allow you to run multiple jobs on the same node.
--
Prentice