Thank you for your reply and apologies for not reacting sooner I have
kept busy until now. I have attached our partition definitions to this
mail.
As for your second question MPI jobs aren't really a issue in our
cluster there are a few in between but not nearly enough to explain up
to 20 nod
We’ve run a similar setup since I moved to Slurm 3 years ago, with no issues.
Could you share partition definitions from your slurm.conf?
When you see a bunch of jobs pending, which ones have a reason of “Resources”?
Those should be the next ones to run, and ones with a reason of “Priority” are
Hello all,
we are experiencing an issue in our cluster where sometimes entire nodes
remain idle while jobs are pending in the queue that could run on the
nodes in question.
Our node topology is a bit special where almost all our nodes are in one
common partition a subset of all those nodes a