date:20200314

[slurm-users] prolog missing variables in 20.02

2020-03-14 Thread Quirin Lohr

Hi, I just upgraded my cluster form 19.05 to 20.02. Now, in the prolog/epilog scripts, the variables SLURM_JOB_GPUS, CUDA_VISIBLE_DEVICES and GPU_DEVICE_ORDINAL are missing. I am setting access to the GPUs via cgroups. The only variables in prolog available are SLURMD_NODENAME SLURM_CLUSTER_

[slurm-users] Fwd: gres/gpu: count changed for node node002 from 0 to 1

2020-03-14 Thread Robert Kudyba

I posted this yesterday and this does appear to be related to a specific job. Note this error: "gres/gpu: count changed for node node002 from 0 to 1" Could it be misleading? What could cause the node to drain? Here are the contents of the user's SBATCH file. Could the piping having an effect here?