[slurm-users] Re: [EXTERN] How do you guys track which GPU is used by which job ?

2024-10-17 Thread Sylvain MARET via slurm-users
e analysis of it. Best wishes, Pierre-Antoine Schnell Am 16.10.24 um 15:10 schrieb Sylvain MARET via slurm-users: Hey guys ! I'm looking to improve GPU monitoring on our cluster. I want to install this https://github.com/NVIDIA/dcgm-exporter and saw in the README that it can support tra

[slurm-users] Re: How do you guys track which GPU is used by which job ?

2024-10-17 Thread Sylvain MARET via slurm-users
nt to generate files that map GPUs to HPC jobs./ It does go on to show the conventions/format of the files. I imagine you could have some bits in a prologue script that creates that as the job starts on the node and point dcgm-exporter there. Brian Andrus On 10/16/24 06:10, Sylvain MARET via

[slurm-users] How do you guys track which GPU is used by which job ?

2024-10-16 Thread Sylvain MARET via slurm-users
Hey guys ! I'm looking to improve GPU monitoring on our cluster. I want to install this https://github.com/NVIDIA/dcgm-exporter and saw in the README that it can support tracking of job id : https://github.com/NVIDIA/dcgm-exporter?tab=readme-ov-file#enabling-hpc-job-mapping-on-dcgm-exporter

[slurm-users] Re: SLURM in K8s, any advice?

2024-03-13 Thread Sylvain MARET via slurm-users
Hello, I haven't played with slurm in k8s but I did attend this talk : https://fosdem.org/2024/schedule/event/fosdem-2024-2590-kubernetes-and-hpc-bare-metal-bros/ Which shows at least someone was able to do so and maybe it'll be worth to talk to her about it. I wanted to ask her for the cod

[slurm-users] Need help managing licence

2024-02-16 Thread Sylvain MARET via slurm-users
Hello everyone ! Recently our users bought a cplex dynamic license and want to use it on our slurm cluster. I've installed the paid version of cplex within modules so authorized user can load it with a simple module load cplex/2111 command but I don't know how to manage and ensure slurm doesn'