[slurm-users] Re: [EXTERNAL] avoid using same GPU by the interactive job

navin srivastava via slurm-users Wed, 12 Feb 2025 20:58:26 -0800

Thank you Jesse.

I am  using Enterprise SLES15SP6 as the OS. I have not introduced the
cgroup functionality in my environment. I can think about it and will see
if this solution works out. but is there any other way to use without
Cgroup to achieve the same.  Batch job requests are fine 2 jobs with each
one GPU request works fine.  in the case of mix( 1 batch job and other
Interactive job) creating the problem.


Is there a way I can run a job and apply the exclusive way only on GPU
resources?

Regards
Navin.



On Wed, Feb 12, 2025 at 11:24 PM Chintanadilok, Jesse <jc...@ti.com> wrote:

> Navin,
>
>
>
> You can isolate GPUs per job if you have cgroups set up properly. What OS
> are you using? Newer OSes will support cgroupsv2 out of the box, but if
> necessary you can continue using v1, this workflow should be applicable for
> both.
>
>
>
> Add ConstrainDevices=yes to your cgroup.conf
>
>
>
> This is what the file looks like at my site:
>
> /etc/slurm/cgroup.conf
>
> CgroupMountpoint="/sys/fs/cgroup"
>
> ConstrainCores=yes
>
> ConstrainRAMSpace=yes
>
> ConstrainSwapSpace=no
>
> ConstrainDevices=yes
>
>
>
> You can find the documentation here:
>
> https://slurm.schedmd.com/cgroup.conf.html
>
>
>
> If you want to share GPUs you can use CUDA MPS or MIG if your GPU supports
> it.
>
>
>
> Regards,
>
> Jesse Chintanadilok
>
>
>
> *From:* navin srivastava via slurm-users <slurm-users@lists.schedmd.com>
> *Sent:* Wednesday, February 12, 2025 10:30
> *To:* Slurm User Community List <slurm-users@lists.schedmd.com>
> *Subject:* [EXTERNAL] [slurm-users] avoid using same GPU by the
> interactive job
>
>
>
> hi, facing an issue in my environment where the batch job and the
> interactive job use the same gpu. Each server has 2 gpu. When 2 batch jobs
> are running it works fine and use the 2 different gpu's. but if one batch
> job is running and another
>
> ZjQcmQRYFpfptBannerStart
>
> *This message was sent from outside of Texas Instruments. *
>
> Do not click links or open attachments unless you recognize the source of
> this email and know the content is safe.
>
>   *  Report Suspicious  *
> <https://us-phishalarm-ewt.proofpoint.com/EWT/v1/G3vK!tDdkczjudcZWZCqpHP6Ikzi-El1-dpSwALBpmsdoXJOODQgC9RVWKYSLBAkkSja6JDeYPDDqYANiCMm4xgWAtpPabtvdEeWe5cMxQWuw7pV_l7LSV6lbgQ$>
>   ‌
>
>
> ZjQcmQRYFpfptBannerEnd
>
> hi,
>
>
>
> facing an issue in my environment where the batch job and the
> interactive job use the same gpu.
>
>
>
> Each server has 2 gpu. When 2 batch jobs are running it works fine and use
> the 2 different gpu's. but if one batch job is running and another job is
> submitted interactively then it uses the same GPU . Is there a way to avoid
> this?
>
>
>
> GresTypes=gpu
>
> NodeName=node[01-02] NodeAddr=node[01-02] CPUs=48 Boards=1
> SocketsPerBoard=2 CoresPerSocket=24 ThreadsPerCore=1 TmpDisk=6000000
> RealMemory=515634 Feature=A100 Gres=gpu:2
>
>
>
> PartitionName=onprem Nodes=node[01-10] Default=YES MaxTime=21-00:00:00
> DefaultTime=3-00:00:00 State=UP Shared=YES OverSubscribe=NO
>
>
>
>  gres.conf:
>
> Name=gpu File=/dev/nvidia0
>
> Name=gpu File=/dev/nvidia1
>
>
>
> Any suggestions on this.
>
>
>
> Regards
>
> Navin
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: [EXTERNAL] avoid using same GPU by the interactive job

Reply via email to