I read that also, however the RTX cards are not really pre-Volta and when I run the mps-server, the nvidia-smi tool gives me e.g. +--------------------------------------------------------------------------- --+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=========================================================================== ==| | 0 20294 C nvidia-cuda-mps-server 25MiB | | 1 20294 C nvidia-cuda-mps-server 25MiB | | 2 20294 C nvidia-cuda-mps-server 29MiB | | 2 20329 M+C /opt/lammps/lammps-20201029-cuda/bin/lmp 6087MiB | | 2 20393 M+C /opt/lammps/lammps-20201029-cuda/bin/lmp 6087MiB | +--------------------------------------------------------------------------- --+ So, the server is running on every GPU, while slurm schedules only for V100 with --gres=mps:100. Note that the GPU numbering of nvidia-smi and gres.conf are inverted, so the GPU ID 2 is the V100 here. The ordering in gres.conf has no effect on this, also commenting the lines regarding V100 and removing the V100 from node configuration does not change anything, with --gres=mps still the V100 is used.
With excluding the V100 from the mps server by setting CUDA_VISIBLE_DEVICES=1,2 (!), one job is running on an rtx card, but a second job is pending due to not found resources. > From the NVIDIA docs re: MPS: > On systems with a mix of Volta / pre-Volta GPUs, if the MPS server is set to enumerate any Volta GPU, it will discard all pre-Volta GPUs. In other words, the MPS server will either operate only on the Volta GPUs and expose Volta capabilities, or operate only on pre-Volta GPUs. > I'd be curious what happens if you change the ordering (RTX then V100) in the gres.conf -- would the RTX work with MPS and the V100 would not? > > On Nov 13, 2020, at 07:23 , Holger Badorreck <h.badorr...@lzh.de> wrote: > > > > Hello, > > > > I have a heterogeneous GPU Node with one V100 and two RTX cards. When I request resources with --gres=mps:100, always the V100 is chosen, and jobs are waiting if the V100 is completely allocated, while RTX cards are free. If I use --gres=gpu:1, also the RTX cards are used. Is something wrong with the configuration or is it another problem? > > > > The node configuration in slurm.conf: > > NodeName=node1 CPUs=48 RealMemory=128530 Sockets=1 CoresPerSocket=24 > > ThreadsPerCore=2 Gres=gpu:v100:1,gpu:rtx:2,mps:600 State=UNKNOWN > > > > gres.conf: > > Name=gpu Type=v100 File=/dev/nvidia0 > > Name=gpu Type=rtx File=/dev/nvidia1 > > Name=gpu Type=rtx File=/dev/nvidia2 > > Name=mps Count=200 File=/dev/nvidia0 > > Name=mps Count=200 File=/dev/nvidia1 > > Name=mps Count=200 File=/dev/nvidia2 > > > > Best regards, > > Holger