e analysis of it.
Best wishes,
Pierre-Antoine Schnell
Am 16.10.24 um 15:10 schrieb Sylvain MARET via slurm-users:
Hey guys !
I'm looking to improve GPU monitoring on our cluster. I want to install
this https://github.com/NVIDIA/dcgm-exporter and saw in the README that
it can support tra
nt to generate files
that map GPUs to HPC jobs./
It does go on to show the conventions/format of the files.
I imagine you could have some bits in a prologue script that creates
that as the job starts on the node and point dcgm-exporter there.
Brian Andrus
On 10/16/24 06:10, Sylvain MARET via
Hey guys !
I'm looking to improve GPU monitoring on our cluster. I want to install
this https://github.com/NVIDIA/dcgm-exporter and saw in the README that
it can support tracking of job id :
https://github.com/NVIDIA/dcgm-exporter?tab=readme-ov-file#enabling-hpc-job-mapping-on-dcgm-exporter
Hello,
I haven't played with slurm in k8s but I did attend this talk :
https://fosdem.org/2024/schedule/event/fosdem-2024-2590-kubernetes-and-hpc-bare-metal-bros/
Which shows at least someone was able to do so and maybe it'll be worth
to talk to her about it. I wanted to ask her for the cod
Hello everyone !
Recently our users bought a cplex dynamic license and want to use it on
our slurm cluster.
I've installed the paid version of cplex within modules so authorized
user can load it with a simple module load cplex/2111 command but I
don't know how to manage and ensure slurm doesn'