On 30/1/20 10:20 am, Dr. Thomas Orgis wrote:
Matching for user (-u) and Job ID (-j) works, but not -N/-S/-E. So is
this just the current state and it's up to me to provide a patch to
enable it if I want that behaviour?
You're using a very very very old version of slurm there (15.08), you
shou
Slurm 19.05 now, though all these settings were in effect on 17.02 until quite
recently. If I get some detail wrong below, I hope someone will correct me. But
this is our current working state. We’ve been able to schedule 10-20k jobs per
month since late 2017, and we successfully scheduled 320k
Hello,
Thank you for your detailed reply. That’s all very useful. I manage to mistype
our cluster size since there are actually 450 standard compute, 40 core,
compute nodes. What you say is interesting and so it concerns me that things
are so bad at the moment,
I wondered if you could please g
I missed reading what size your cluster was at first, but found it on a second
read. Our cluster and typical maximum job size scales about the same way,
though (our users’ typical job size is anywhere from a few cores up to 10% of
our core count).
There are several recommendations to separate y
Hello,
Thank you for your reply. in answer to Mike's questions...
Our serial partition nodes are partially shared by the high memory partition.
That is, the partitions overlap partially -- shared nodes move one way or
another depending upon demand. Jobs requesting up to and including 20 cores a
With the caveat that I haven't built these plugins past Slurm 18, these are
job submit plugins, and that the documentation is weak, you could look at
these plugins I'd written for our cluster:
https://github.com/FredHutch/gizmo-plugins
Contains two plugins I build in the source tree. These set a
Greetings, fellow general university resource administrator.
Couple things come to mind from my experience:
1) does your serial partition share nodes with the other non-serial partitions?
2) what’s your maximum job time allowed, for serial (if the previous answer was
“yes”) and non-serial parti
Hi David,
David Baker writes:
> Hello,
>
> Our SLURM cluster is relatively small. We have 350 standard compute
> nodes each with 40 cores. The largest job that users can run on the
> partition is one requesting 32 nodes. Our cluster is a general
> university research resource and so there are man
Hello,
Our SLURM cluster is relatively small. We have 350 standard compute nodes each
with 40 cores. The largest job that users can run on the partition is one
requesting 32 nodes. Our cluster is a general university research resource and
so there are many different sizes of jobs ranging from