Hi Michael,

Indeed I had the older scheduler loaded and not the backfill. I have updated 
the configuration and will see if the scheduler will pick up the pending jobs.

Thanks

Cristiano
________________________________
From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of 
slurm-users-requ...@lists.schedmd.com <slurm-users-requ...@lists.schedmd.com>
Sent: Wednesday, August 2, 2023 4:15 PM
To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
Subject: slurm-users Digest, Vol 70, Issue 3

[You don't often get email from slurm-users-requ...@lists.schedmd.com. Learn 
why this is important at https://aka.ms/LearnAboutSenderIdentification ]

Send slurm-users mailing list submissions to
        slurm-users@lists.schedmd.com

To subscribe or unsubscribe via the World Wide Web, visit
        
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.schedmd.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fslurm-users&data=05%7C01%7CCristianoMaria.Cumer%40unibz.it%7C5c0379db010c4a4a747908db936311f0%7C9251326703e3401a80d4c58ed6674e3b%7C0%7C0%7C638265825947787326%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=LtVh8aZ9q7GcEmhOB158TaIlQjll5OI3XOe9rcglrq8%3D&reserved=0<https://lists.schedmd.com/cgi-bin/mailman/listinfo/slurm-users>
or, via email, send a message with subject or body 'help' to
        slurm-users-requ...@lists.schedmd.com

You can reach the person managing the list at
        slurm-users-ow...@lists.schedmd.com

When replying, please edit your Subject line so it is more specific
than "Re: Contents of slurm-users digest..."


Today's Topics:

   1. Job in "priority" status - resources available (Cumer Cristiano)
   2. Re: Job in "priority" status - resources available
      (Michael Gutteridge)


----------------------------------------------------------------------

Message: 1
Date: Wed, 2 Aug 2023 12:09:52 +0000
From: Cumer Cristiano <cristianomaria.cu...@unibz.it>
To: "slurm-users@lists.schedmd.com" <slurm-users@lists.schedmd.com>
Subject: [slurm-users] Job in "priority" status - resources available
Message-ID:
        
<pavpr07mb91916b49909e972995ce0806e1...@pavpr07mb9191.eurprd07.prod.outlook.com>

Content-Type: text/plain; charset="iso-8859-1"

Hello,

I'm quite a newbie regarding Slurm. I recently created a small Slurm instance 
to manage our GPU resources. I have this situation:

 JOBID        STATE         TIME   ACCOUNT    PARTITION    PRIORITY             
 REASON CPU MIN_MEM              TRES_PER_NODE
    1739    PENDING         0:00  standard      gpu-low           5            
Priority   1     80G    gres:gpu:a100_1g.10gb:1
    1738    PENDING         0:00  standard      gpu-low           5            
Priority   1     80G  gres:gpu:a100-sxm4-80gb:1
    1737    PENDING         0:00  standard      gpu-low           5            
Priority   1     80G  gres:gpu:a100-sxm4-80gb:1
    1736    PENDING         0:00  standard      gpu-low           5           
Resources   1     80G  gres:gpu:a100-sxm4-80gb:1
    1740    PENDING         0:00  standard      gpu-low           1            
Priority   1      8G      gres:gpu:a100_3g.39gb
    1735    PENDING         0:00  standard      gpu-low           1            
Priority   8     64G  gres:gpu:a100-sxm4-80gb:1
    1596    RUNNING   1-13:26:45  standard      gpu-low           3             
   None   2     64G    gres:gpu:a100_1g.10gb:1
    1653    RUNNING     21:09:52  standard      gpu-low           2             
   None   1     16G                 gres:gpu:1
    1734    RUNNING        59:52  standard      gpu-low           1             
   None   8     64G  gres:gpu:a100-sxm4-80gb:1
    1733    RUNNING      1:01:54  standard      gpu-low           1             
   None   8     64G  gres:gpu:a100-sxm4-80gb:1
    1732    RUNNING      1:02:39  standard      gpu-low           1             
   None   8     40G  gres:gpu:a100-sxm4-80gb:1
    1731    RUNNING      1:08:28  standard      gpu-low           1             
   None   8     40G  gres:gpu:a100-sxm4-80gb:1
    1718    RUNNING     10:16:40  standard      gpu-low           1             
   None   2      8G              gres:gpu:v100
    1630    RUNNING   1-00:21:21  standard      gpu-low           1             
   None   1     30G      gres:gpu:a100_3g.39gb
    1610    RUNNING   1-09:53:23  standard      gpu-low           1             
   None   2      8G              gres:gpu:v100



Job 1736 is in the PENDING state since there are no more available 
a100-sxm4-80gb GPUs. The job priority starts to rise with time (priority 5) as 
expected. Now another user submits job 1739 on a gres:gpu:a100_1g.10gb:1 that 
is available, but the job is not starting since its priority is 1. This is 
obviously not the desired outcome, and I believe I must change the scheduling 
strategy. Could someone with more experience than me give me some hints?

Thanks, Cristiano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.schedmd.com%2Fpipermail%2Fslurm-users%2Fattachments%2F20230802%2F27400545%2Fattachment-0001.htm&data=05%7C01%7CCristianoMaria.Cumer%40unibz.it%7C5c0379db010c4a4a747908db936311f0%7C9251326703e3401a80d4c58ed6674e3b%7C0%7C0%7C638265825947787326%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=LwbGVTucT%2B01WhlWicMYUqss%2FRRZMCLHlGMfOsTAckg%3D&reserved=0<http://lists.schedmd.com/pipermail/slurm-users/attachments/20230802/27400545/attachment-0001.htm>>

------------------------------

Message: 2
Date: Wed, 2 Aug 2023 07:15:06 -0700
From: Michael Gutteridge <michael.gutteri...@gmail.com>
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: Re: [slurm-users] Job in "priority" status - resources
        available
Message-ID:
        <calul84uj7yc7h_eb7c1vahhdoytrpb5fhz35u8z24mmzwgc...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

I'm not sure there's enough information in your message- Slurm version and
configs are often necessary to make a more confident diagnosis.  However,
the behaviour you are looking for (lower priority jobs skipping the line)
is called "backfill".  There's docs here:
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fsched_config.html%23backfill&data=05%7C01%7CCristianoMaria.Cumer%40unibz.it%7C5c0379db010c4a4a747908db936311f0%7C9251326703e3401a80d4c58ed6674e3b%7C0%7C0%7C638265825947787326%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6Bh%2FcyWGU3CyZwhR8igsrytnV8fE5B7RpYFzEzwXapY%3D&reserved=0<https://slurm.schedmd.com/sched_config.html#backfill>

It should be loaded and active by default which is why I'm not super
confident here.  There may also be something else going on with the node
configuration as it looks like 1596 would maybe need the same node?  Maybe
there's not enough CPU or memory to accommodate both jobs (1596 and 1739)?

HTH
 - Michael

On Wed, Aug 2, 2023 at 5:13?AM Cumer Cristiano <
cristianomaria.cu...@unibz.it> wrote:

> Hello,
>
> I'm quite a newbie regarding Slurm. I recently created a small Slurm
> instance to manage our GPU resources. I have this situation:
>
>  JOBID        STATE         TIME   ACCOUNT    PARTITION    PRIORITY
>        REASON CPU MIN_MEM              TRES_PER_NODE
>     1739    PENDING         0:00  standard      gpu-low           5
>      Priority   1     80G    gres:gpu:a100_1g.10gb:1
>     1738    PENDING         0:00  standard      gpu-low           5
>      Priority   1     80G  gres:gpu:a100-sxm4-80gb:1
>     1737    PENDING         0:00  standard      gpu-low           5
>      Priority   1     80G  gres:gpu:a100-sxm4-80gb:1
>     1736    PENDING         0:00  standard      gpu-low           5
>     Resources   1     80G  gres:gpu:a100-sxm4-80gb:1
>     1740    PENDING         0:00  standard      gpu-low           1
>      Priority   1      8G      gres:gpu:a100_3g.39gb
>     1735    PENDING         0:00  standard      gpu-low           1
>      Priority   8     64G  gres:gpu:a100-sxm4-80gb:1
>     1596    RUNNING   1-13:26:45  standard      gpu-low           3
>          None   2     64G    gres:gpu:a100_1g.10gb:1
>     1653    RUNNING     21:09:52  standard      gpu-low           2
>          None   1     16G                 gres:gpu:1
>     1734    RUNNING        59:52  standard      gpu-low           1
>          None   8     64G  gres:gpu:a100-sxm4-80gb:1
>     1733    RUNNING      1:01:54  standard      gpu-low           1
>          None   8     64G  gres:gpu:a100-sxm4-80gb:1
>     1732    RUNNING      1:02:39  standard      gpu-low           1
>          None   8     40G  gres:gpu:a100-sxm4-80gb:1
>     1731    RUNNING      1:08:28  standard      gpu-low           1
>          None   8     40G  gres:gpu:a100-sxm4-80gb:1
>     1718    RUNNING     10:16:40  standard      gpu-low           1
>          None   2      8G              gres:gpu:v100
>     1630    RUNNING   1-00:21:21  standard      gpu-low           1
>          None   1     30G      gres:gpu:a100_3g.39gb
>     1610    RUNNING   1-09:53:23  standard      gpu-low           1
>          None   2      8G              gres:gpu:v100
>
>
> Job 1736 is in the PENDING state since there are no more available
> a100-sxm4-80gb GPUs. The job priority starts to rise with time (priority 5)
> as expected. Now another user submits job 1739 on a gres:gpu:a100_1g.10gb:1
> that is available, but the job is not starting since its priority is 1.
> This is obviously not the desired outcome, and I believe I must change the
> scheduling strategy. Could someone with more experience than me give me
> some hints?
>
> Thanks, Cristiano
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.schedmd.com%2Fpipermail%2Fslurm-users%2Fattachments%2F20230802%2F0e4837c3%2Fattachment.htm&data=05%7C01%7CCristianoMaria.Cumer%40unibz.it%7C5c0379db010c4a4a747908db936311f0%7C9251326703e3401a80d4c58ed6674e3b%7C0%7C0%7C638265825947787326%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Nrc4A9AOAkkjSY9t5HNWsx%2BGfH4Gjl%2Fe9jaZ8sUiupQ%3D&reserved=0<http://lists.schedmd.com/pipermail/slurm-users/attachments/20230802/0e4837c3/attachment.htm>>

End of slurm-users Digest, Vol 70, Issue 3
******************************************

Reply via email to