Hi Loris,

It would be great if Slurm could read the GPU load using the Nvidia monitoring tools, and then make the GPUload available through "scontrol show node xxx". But I don't know if anyone has asked for (and paid) SchedMD to implement this?

Best regards,
Ole

On 12/14/21 14:16, Loris Bennett wrote:
Hi Ole,

Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> writes:

The latest pestat version now adds a red color highlight if the GRES GPU is the
(null) value.

We use this to highlight jobs on GPU nodes which didn't request any GPU
resources, thereby possibly wasting resources.

Could you test if this is useful and give me a feedback?

In job_submit.lua we check whether a job sent to the GPU partition has
actually requested a GPU as a TRES and, if not, reject it.  So that kind
of wastage doesn't occur.

However, we do sometimes push non-GPU jobs onto GPU-nodes within a
scavenger partition, so it would be handy if pestat highlighted these.
At the moment, though, there are no such jobs, so I can't test.

It would however be good to be able to display the utilisation of the
GPUs via the command-line.  Some people request GPUs, but the jobs don't
manage to use them very much.  At the opposite end of the usage
spectrum, today, via our Zabbix monitoring, I spotted some jobs with an
unusually high GPU-efficiencies which turned out to be doing
cryptomining :-/


Reply via email to