Hi! We have machines with multiple GPUs (Nvidia V100). We allow multiple (two) jobs on the nodes.
We have a user that in some way have managed to get both jobs to end up on the same GPU (verified via nvidia-smi).
We are using cgroups and the nvidia-smi command only shows one of the GPUs (if only one GPU are requested) and only the defined /dev/nvidia? device are accessable.
We are unable to reproduce this. Have anybody seen anything like this? /Magnus -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet