[slurm-users] Re: [EXTERN] How do you guys track which GPU is used by which job ?

2024-10-17 Thread Markus Kötter via slurm-users
Hi, As their example was limited too "allgpus", I had posted my take on this on the nvidia developer blog. Basically all the same, but lookups the groupid from the dcgmi group json using jp instead of a file. https://developer.nvidia.com/blog/job-statistics-nvidia-data-center-gpu-manager-s

[slurm-users] Re: scrun: Failed to run the container due to GID mapping configuration

2024-04-04 Thread Markus Kötter via slurm-users
Hi, On 04.04.24 04:46, Toshiki Sonoda (Fujitsu) via slurm-users wrote: We set up scrun (slurm 23.11.5) integrated with rootless podman, I'd recommend looking into nvidia enroot instead. https://slurm.schedmd.com/SLUG19/NVIDIA_Containers.pdf MfG -- Markus Kötter, +49 681 870832