Re: [slurm-users] not allocating jobs even resources are free

Brian W. Johanson Wed, 29 Apr 2020 12:18:17 -0700

Navin,

Check out 'sprio', this will give show you how the job priority changeswith the weight changes you are making.

-b


On 4/29/20 5:00 AM, navin srivastava wrote:

Thanks Daniel.

All jobs went into run state so unable to provide the details butdefinitely will reach out later if we see similar issue.

i am more interested to understand the FIFO with Fair Tree.it will begood if anybody provide some insight on this combination and also ifwe will enable the backfilling here how the behaviour will change.


what is the role of the Fair tree here?

PriorityType=priority/multifactor
PriorityDecayHalfLife=2
PriorityUsageResetPeriod=DAILY
PriorityWeightFairshare=500000
PriorityFlags=FAIR_TREE

Regards
Navin.

On Mon, Apr 27, 2020 at 9:37 PM Daniel Letai <d...@letai.org.il<mailto:d...@letai.org.il>> wrote:


    Are you sure there are enough resources available? The node is in
    mixed state, so it's configured for both partitions - it's
    possible that earlier lower priority jobs are already running thus
    blocking the later jobs, especially since it's fifo.


    It would really help if you pasted the results of:

    squeue

    sinfo


    As well as the exact sbatch line, so we can see how many resources
    per node are requested.


    On 26/04/2020 12:00:06, navin srivastava wrote:

    Thanks Brian,

    As suggested i gone through document and what i understood  that
    the fair tree leads to the Fairshare mechanism and based on that
    the job should be scheduling.

    so it mean job scheduling will be based on FIFO but priority will
    be decided on the Fairshare. i am not sure if both conflicts
    here.if i see the normal jobs priority is lower than the GPUsmall
    priority. so resources are available with gpusmall partition then
    it should go. there is no job pend due to gpu resources. the gpu
    resources itself not asked with the job.

    is there any article where i can see how the fairshare works and
    which are setting should not be conflict with this.
    According to document it never says that if fair-share is applied
    then FIFO should be disabled.

    Regards
    Navin.





    On Sat, Apr 25, 2020 at 12:47 AM Brian W. Johanson
    <bjoha...@psc.edu <mailto:bjoha...@psc.edu>> wrote:


        If you haven't looked at the man page for slurm.conf, it will
        answer most if not all your questions.
        https://slurm.schedmd.com/slurm.conf.html but I would depend
        on the the manual version that was distributed with the
        version you have installed as options do change.

        There is a ton of information that is tedious to get through
        but reading through it multiple times opens many doors.

        DefaultTime is listed in there as a Partition option.
        If you are scheduling gres/gpu resources, it's quite possible
        there are cores available with no corresponding gpus avail.

        -b

        On 4/24/20 2:49 PM, navin srivastava wrote:

        Thanks Brian.

        I need  to check the jobs order.

        Is there  any way to define the default timeline of the job
        if user  not specifying time limit.

        Also what does the meaning of fairtree  in priorities in
        slurm.Conf file.

        The set of nodes are different in partitions.FIFO  does  not
        care for any partitiong.
        Is it like strict odering means the job came 1st will go and
        until  it runs it will  not allow others.

        Also priorities is high for gpusmall partition and low for
        normal jobs and the nodes of the normal partition is full
        but gpusmall cores are available.

        Regards
        Navin

        On Fri, Apr 24, 2020, 23:49 Brian W. Johanson
        <bjoha...@psc.edu <mailto:bjoha...@psc.edu>> wrote:

            Without seeing the jobs in your queue, I would expect
            the next job in FIFO order to be too large to fit in the
            current idle resources.

            Configure it to use the backfill scheduler:
            SchedulerType=sched/backfill

                  SchedulerType
                          Identifies  the type of scheduler to be
            used.  Note the slurmctld daemon must be restarted for a
            change in scheduler type to become effective
            (reconfiguring a running daemon has no effect for this
            parameter).  The scontrol command can be used to
            manually change job priorities if desired.  Acceptable
            values include:

                          sched/backfill
                                 For a backfill scheduling module to
            augment the default FIFO scheduling.  Backfill
            scheduling will initiate lower-priority jobs if doing so
            does not delay the expected initiation time of any 
            higher  priority  job. Effectiveness  of  backfill
            scheduling is dependent upon users specifying job time
            limits, otherwise all jobs will have the same time limit
            and backfilling is impossible.  Note documentation for
            the SchedulerParameters option above.  This is the
            default configuration.

                          sched/builtin
                                 This  is  the  FIFO scheduler which
            initiates jobs in priority order.  If any job in the
            partition can not be scheduled, no lower priority job in
            that partition will be scheduled.  An exception is made
            for jobs that can not run due to partition constraints
            (e.g. the time limit) or down/drained nodes.  In that
            case, lower priority jobs can be initiated and not
            impact the higher priority job.



            Your partitions are set with maxtime=INFINITE, if your
            users are not specifying a reasonable timelimit to their
            jobs, this won't help either.


            -b


            On 4/24/20 1:52 PM, navin srivastava wrote:

            In addition to the above when i see the sprio of both
            the jobs it says :-

            for normal queue jobs all jobs showing the same priority

             JOBID PARTITION   PRIORITY  FAIRSHARE
                    1291352 normal           15789      15789

            for GPUsmall all jobs showing the same priority.

             JOBID PARTITION   PRIORITY  FAIRSHARE
                    1291339 GPUsmall      21052    21053

            On Fri, Apr 24, 2020 at 11:14 PM navin srivastava
            <navin.alt...@gmail.com
            <mailto:navin.alt...@gmail.com>> wrote:

                Hi Team,

                we are facing some issue in our environment. The
                resources are free but job is going into the QUEUE
                state but not running.

                i have attached the slurm.conf file here.

                scenario:-

                There are job only in the 2 partitions:
                 344 jobs are in PD state in normal partition and
                the node belongs from the normal partitions are
                full and no more job can run.

                1300 JOBS are in GPUsmall partition are in queue
                and enough CPU is avaiable to execute the jobs but
                i see the jobs are not scheduling on free nodes.

                Rest there are no pend jobs in any other partition .
                eg:-
                node status:- node18

                NodeName=node18 Arch=x86_64 CoresPerSocket=18
                   CPUAlloc=6 CPUErr=0 CPUTot=36 CPULoad=4.07
                   AvailableFeatures=K2200
                   ActiveFeatures=K2200
                   Gres=gpu:2
                   NodeAddr=node18 NodeHostName=node18 Version=17.11
                   OS=Linux 4.4.140-94.42-default #1 SMP Tue Jul 17
                07:44:50 UTC 2018 (0b375e4)
                   RealMemory=1 AllocMem=0 FreeMem=79532 Sockets=2
                Boards=1
                   State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1
                Owner=N/A MCS_label=N/A
                   Partitions=GPUsmall,pm_shared
                   BootTime=2019-12-10T14:16:37
                SlurmdStartTime=2019-12-10T14:24:08
                 CfgTRES=cpu=36,mem=1M,billing=36
                   AllocTRES=cpu=6
                   CapWatts=n/a
                   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
                   ExtSensorsJoules=n/s ExtSensorsWatts=0
                ExtSensorsTemp=n/s

                node19:-

                NodeName=node19 Arch=x86_64 CoresPerSocket=18
                   CPUAlloc=16 CPUErr=0 CPUTot=36 CPULoad=15.43
                   AvailableFeatures=K2200
                   ActiveFeatures=K2200
                   Gres=gpu:2
                   NodeAddr=node19 NodeHostName=node19 Version=17.11
                   OS=Linux 4.12.14-94.41-default #1 SMP Wed Oct 31
                12:25:04 UTC 2018 (3090901)
                   RealMemory=1 AllocMem=0 FreeMem=63998 Sockets=2
                Boards=1
                   State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1
                Owner=N/A MCS_label=N/A
                   Partitions=GPUsmall,pm_shared
                   BootTime=2020-03-12T06:51:54
                SlurmdStartTime=2020-03-12T06:53:14
                 CfgTRES=cpu=36,mem=1M,billing=36
                   AllocTRES=cpu=16
                   CapWatts=n/a
                   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
                   ExtSensorsJoules=n/s ExtSensorsWatts=0
                ExtSensorsTemp=n/s

                could you please help me to understand what could
                be the reason?

--Regards,


    Daniel Letai
    +972 (0)505 870 456

Re: [slurm-users] not allocating jobs even resources are free

Reply via email to