Ah, of course, that makes sense, thanks. I guess if we're constraining the devices into job specific cgroups then the Slurmd on the node may know what device is assigned to what job and be able to interrogate resource usage from that but there's no mechanism to do it anything other than that.
On Fri, 23 Nov 2018, 6:36 pm Mark Hahn <h...@mcmaster.ca wrote: > > We have a use-case in that the GRES being tracked on a particular > partition > >are GPU cards, but aren't being used by applications that would require > them > >exclusively (lightweight direct rendering rather than GP-GPU/CUDA > > the issue is that slurm/kernel can't arbitrate resources on the GPU, > so oversubscription is likely to run out of device memory or SMs, no? > >