Re: [slurm-users] Controlling access to idle nodes

Paul Edmon Tue, 06 Oct 2020 10:02:06 -0700

We set up a partition that underlies all our hardware that ispreemptable by all higher priority partitions. That way it can grabidle cycles while permitting higher priority jobs to run. This alsoallows users to do:


#SBATCH -p primarypartition,requeuepartition

So that the scheduler will select which one their job will run on morequickly. Then we rely on fairshare to adjudicate priority.


-Paul Edmon-

On 10/6/2020 11:37 AM, Jason Simms wrote:

Hello David,
I'm still relatively new at Slurm, but one way we handle this is thatfor users/groups who have "bought in" to the cluster, we use a QOS toprovide them preemptible access to the set of resources provided by,e.g., a set number of nodes, but not the nodes themselves. That is, inone example, two researchers each have priority preemptible access toup to 52 cores in the cluster, but those cores can come from anyphysical node. I set the priority of the QOS for each researcherequal, such that they cannot preempt each other.
Admittedly, this works best and most simply in a situation where yournodes are relatively homogeneous, as ours currently are. I am tryingto avoid a situation where a given physical node is restricted to aspecific researcher/group, as I want all nodes, as much as possible,to be available to all users, precisely so that idle cycles don't goto waste. It aligns with the general philosophy that nodes are morelike cattle and less like pets, in my opinion, so I try to treat themlike a giant shared pool rather than multiple independent, gated systems.
Anyway, I suspect other users here with more experience might have adifferent, or better, approach and I look forward to hearing theirthoughts as well.
Warmest regards,
Jason
On Tue, Oct 6, 2020 at 11:12 AM David Baker <d.j.ba...@soton.ac.uk<mailto:d.j.ba...@soton.ac.uk>> wrote:
    Hello,

    I would appreciate your advice on how to deal with this situation
    in Slurm, please. If I have a set of nodes used by 2 groups, and
    normally each group would each have access to half the nodes. So,
    I could limit each group to have access to 3 nodes each, for
    example. I am trying to devise a scheme that allows each group to
    make best use of the node always. In other words, each group could
    potentially use all the nodes (assuming they all free and the
    other group isn't using the nodes at all).

    I cannot set hard and soft limits in slurm, and so I'm not sure
    how to make the situation flexible. Ideally It would be good for
    each group to be able to use their allocation and then take
    advantage of any idle nodes via a scavenging mechanism. The other
    group could then pre-empt the scavenger jobs and claim their
    nodes. I'm struggling with this since this seems like a two-way
    scavenger situation.

    Could anyone please help? I have, by the way, set up
    partition-based pre-emption in the cluster. This allows the
    general public to scavenge nodes owned by research groups.

    Best regards,
    David




--
*Jason L. Simms, Ph.D., M.P.H.*
Manager of Research and High-Performance Computing
XSEDE Campus Champion
Lafayette College
Information Technology Services
710 Sullivan Rd | Easton, PA 18042
Office: 112 Skillman Library
p: (610) 330-5632

Re: [slurm-users] Controlling access to idle nodes

Reply via email to