On 6/4/21 11:04 am, Ahmad Khalifa wrote:

Because there are failing GPUs that I'm trying to avoid.

Could you remove them from your gres.conf and adjust slurm.conf to match?

If you're using cgroups enforcement for devices (ConstrainDevices=yes in cgroup.conf) then that should render them inaccessible to jobs.

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Reply via email to