Re: [slurm-users] Priority access for a group of users

Prentice Bisbal Tue, 19 Feb 2019 06:15:43 -0800

I just set this up a couple of weeks ago myself. Creating two partitionsis definitely the way to go. I created one partition, "general" fornormal, general-access jobs, and another, "interruptible" forgeneral-access jobs that can be interrupted, and then set PriorityTieraccordingly in my slurm.conf file (Node names omitted for clarity/brevity).

PartitionName=general Nodes=... MaxTime=48:00:00 State=UpPriorityTier=10 QOS=generalPartitionName=interruptible Nodes=... MaxTime=48:00:00 State=UpPriorityTier=1 QOS=interruptible

I then set PreemptMode=Requeue, because I'd rather have jobs requeuedthan suspended. And it's been working great. There are few othersettings I had to change. The best documentation for all the settingsyou need to change is https://slurm.schedmd.com/preempt.html

Everything has been working exactly as desired and advertised. My userswho needed the ability to run low-priority, long-running jobs are veryhappy.

The one caveat is that jobs that will be killed and requeued need tosupport checkpoint/restart. So when this becomes a production thing,users are going to have to acknowledge that they will only use thispartition for jobs that have some sort of checkpoint/restart capability.


Prentice

On 2/15/19 11:56 AM, david baker wrote:

Hi Paul, Marcus,

Thank you for your replies. Using partition priority all makes sense.I was thinking of doing something similar with a set of nodespurchased by another group. That is, having a private high prioritypartition and a lower priority "scavenger" partition for the public.In this case scavenger jobs will get killed when preempted.

In the present case , I did wonder if it would be possible to dosomething with just a single partition -- hence my question.Yourreplies have convinced me that two partitions will work -- withpreemption leading to re-queued jobs.


Best regards,
David

On Fri, Feb 15, 2019 at 3:09 PM Paul Edmon <ped...@cfa.harvard.edu<mailto:ped...@cfa.harvard.edu>> wrote:


    Yup, PriorityTier is what we use to do exactly that here.  That
    said unless you turn on preemption jobs may still pend if there is
    no space.  We run with REQUEUE on which has worked well.


    -Paul Edmon-


    On 2/15/19 7:19 AM, Marcus Wagner wrote:

    Hi David,

    as far as I know, you can use the PriorityTier (partition
    parameter) to achieve this. According to the manpages (if I
    remember right) jobs from higher priority tier partitions have
    precedence over jobs from lower priority tier partitions, without
    taking the normal fairshare priority into consideration.

    Best
    Marcus

    On 2/15/19 10:07 AM, David Baker wrote:


    Hello.


    We have a small set of compute nodes owned by a group. The group
    has agreed that the rest of the HPC community can use these
    nodes providing that they (the owners) can always have priority
    access to the nodes. The four nodes are well provisioned (1
    TByte memory each plus 2 GRID K2 graphics cards) and so there is
    no need to worry about preemption. In fact I'm happy for the
    nodes to be used as well as possible by all users. It's just
    that jobs from the owners must take priority if resources are
    scarce.


    What is the best way to achieve the above in slurm? I'm planning
    to place the nodes in their own partition. The node owners will
    have priority access to the nodes in that partition, but will
    have no advantage when submitting jobs to the public resources.
    Does anyone please have any ideas how to deal with this?


    Best regards,

    David

--Marcus Wagner, Dipl.-Inf.


    IT Center
    Abteilung: Systeme und Betrieb
    RWTH Aachen University
    Seffenter Weg 23
    52074 Aachen
    Tel: +49 241 80-24383
    Fax: +49 241 80-624383
    wag...@itc.rwth-aachen.de  <mailto:wag...@itc.rwth-aachen.de>
    www.itc.rwth-aachen.de  <http://www.itc.rwth-aachen.de>

Re: [slurm-users] Priority access for a group of users

Reply via email to