[slurm-users] Why job memory request may be automatically set by Slurm to RealMemory of some node?
Hey, I noticed a weird behavior of Slurm 21 and 22. When the following conditions are satisfied, then Slurm implicitly sets job memory request equal to RealMemory of some node (perhaps first node that satisfies other job's requests, but this is not documented, or I could not find in the documentation): - RealMemory is specified explicitly in slurm.conf for NodeName line, - no DefMemPerXXX is specified in slurm.conf, - user does not specify memory request, - cons_tres plugin is configured. Should it be set at least to FreeMemory, or even left empty? Best regards, Taras
[slurm-users] hierarchies/dependencies between QoSs
Hi, is there a way to have hierarchies/dependencies between different QoS's, except from preemption? Is it possible to change the qos of a running job? We have qos=gpus2, qos=gpus4 and qos=gpus6 (allowing a certain maximal total number of gpus for the user). I want that the running/pending qos=gpus2 jobs are converted to qos=gpus4 jobs when a qos=gpus4 job is submitted, and also, when qos=gpus4 jobs are already running, new qos=gpus2 jobs are subimtted automatically as qos=gpus4 jobs. Thanks, Sebastian
[slurm-users] Why every job will sleep 100000000
I found a sleep process running by root, when I submit a job. And it sleep 1 seconds. Sometimes, my job is hung up. The job state is "R". Though it runs nothing, the jobscript like the following, -- #!/bin/bash #SBATCH -J sub #SBATCH -N 1 #SBATCH -n 1 #SBATCH -p vpartition -- Is it because of "sleep 1" process? Or how could I debug it? Any help will be appreciated. --GHui
Re: [slurm-users] Why every job will sleep 100000000
If you examine the process hierarchy, that "sleep 1" process if probably the child of a "slurmstepd: [.extern]" process. This is a housekeeping step launched for the job by slurmd -- in older Slurm releases it would handle the X11 forwarding, for example. It should have no impact on the other steps of the job. > On Nov 4, 2022, at 05:26 , GHui wrote: > > I found a sleep process running by root, when I submit a job. And it sleep > 1 seconds. > Sometimes, my job is hung up. The job state is "R". Though it runs nothing, > the jobscript like the following, > -- > #!/bin/bash > #SBATCH -J sub > #SBATCH -N 1 > #SBATCH -n 1 > #SBATCH -p vpartition > > -- > > Is it because of "sleep 1" process? Or how could I debug it? > > Any help will be appreciated. > --GHui