Hi, Frequently all of our GPU nodes (8xGPU each) are in MIXED state and there is no IDLE node. Some jobs require a complete node (all 8 GPUs) and such jobs therefore have to wait really long before they can run.
Is there a way of improving this situation? E.g. by not blocking IDLE nodes with jobs that only use a fraction of the 8 GPUs? Why are single GPU jobs not scheduled to fill already MIXED nodes before using IDLE ones? What parameters/configuration need to be adjusted for this to be enforced? Our current scheduling configuration: slurm.conf: SelectType=select/cons_tres SelectTypeParameters=CR_Core_Memory gres.conf (one node example): NodeName=gpu-6 Name=gpu Type=rtx2080ti File=/dev/nvidia[0-3] COREs=0-17,36-53 NodeName=gpu-6 Name=gpu Type=rtx2080ti File=/dev/nvidia[4-7] COREs=18-35,54-71 Thank you, Durai Competence center for Machine Learning Tübingen