Hi all,

I’m trying to pull (and understand) some GPU usage metrics for historical 
purposes, and dug into sacct’s TRES reporting a bit. We have 
AccountingStorageTRES=gres/gpu set in slurm.conf so we do see gres/gpuutil and 
gres/gpumem numbers available, but I’m struggling to find Slurm-side 
documentation that describes the units of these values. In looking at the code 
for gpu_nvml.c it seems the “nvmlDeviceGetProcessUtilization” function is being 
used and returns units in percentages, but I’m lost on the rest of the 
calculation.

Does anyone know if these units are percentages, and how they are calculated 
for the final job record, especially wrt multi-GPU jobs with a bunch of 
processes/moving parts? For context I’ve been looking at TRESUsageInTot and 
TRESUsageInAve so far. Also we’re currently running Slurm v23.02.6

Thanks in advance!

--
Jordan Robertson
Preferred pronouns: he/him/his
Technology Architect | Research Technology Services
DigITs, Technology Division
Memorial Sloan Kettering Cancer Center
929-687-1066
rober...@mskcc.org<mailto:rober...@mskcc.org>

=====================================================================

Please note that this e-mail and any files transmitted from
Memorial Sloan Kettering Cancer Center may be privileged, confidential,
and protected from disclosure under applicable law. If the reader of
this message is not the intended recipient, or an employee or agent
responsible for delivering this message to the intended recipient,
you are hereby notified that any reading, dissemination, distribution,
copying, or other use of this communication or any of its attachments
is strictly prohibited. If you have received this communication in
error, please notify the sender immediately by replying to this message
and deleting this message, any attachments, and all copies and backups
from your computer.

Disclaimer ID:MSKCC
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to