Dear there,
I have two jobs in my cluster, which has 32 cores per compute node. The
first job uses eight nodes and 256 cores, which means it takes up all eight
nodes. The second job uses five nodes and 32 cores, which means only partial
cores of five nodes will be used. Slurm, however, allocated some of the same
nodes for the two jobs, resulting in overload of these nodes. I wonder if my
partition configuration OverSubscribe=FORCE:1 caused this to happen. How to
prevent this from happening?
Appreciatively,
Menglong