i would imagine that slurm should be able to pull that data through nvml. but i'd bet the hooks aren't inplace.
On Fri, Jan 15, 2021 at 7:44 AM Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> wrote: > > Hi, > > We have installed some new GPU nodes, and now users are asking for some > sort of monitoring of GPU utilisation and GPU memory utilisation at the > end of a job, like what Slurm already provides for CPU and memory usage. > > I haven't found any pages describing how to perform GPU accounting within > Slurm, so I would like to ask the user community for some advice on the > best practices and any available (simple) tools out there. > > What I have discovered is that Nvidia provides process accounting using > nvidia-smi[1]. It is enabled with > > $ nvidia-smi --accounting-mode=1 > > and queried with > > $ nvidia-smi > --query-accounted-apps=gpu_name,pid,time,gpu_util,mem_util,max_memory_usage > --format=csv > > but the documentation seems quite scant, and so far I don't see any output > from this query command. > > Some questions: > > 1. Is there a way to integrate the Nvidia process accounting into Slurm? > > 2. Can users run the above command in the job scripts and get the GPU > accounting information? > > Thanks, > Ole > > References: > 1. https://developer.nvidia.com/nvidia-system-management-interface >