On Tue, Mar 12, 2019 at 1:14 AM Andy Leung Yin Sui <moliu...@gmail.com> wrote: > > Hi, > > I am new to slurm and want to use weight option to schedule the jobs. > I have some machine with same hardware configuration with GPU cards. I > use QoS to force user at least required 1 gpu gres when submitting > jobs. > The machine serve multiple partition. > What I want is consume dedicated nodes first when schedule gpu_2h > parition jobs by adding weight settings.(e.g. schedule to GPU38/39 > rather than 36/37). However, the scheduler turns out not following the > weight settings and schedule to 36/37 (e.g. srun -p gpu_2h). > All the GPU node are idle and the billing are same, did I miss > something? Was it some limitation if a nodes server multiple partition > or consume GRES? Please advise. Thank you very much. > > Below are the setting which may help. > slurm.conf > NodeName=gpu[36-37] Gres=gpu:titanxp:4 ThreadsPerCore=2 State=unknown > Sockets=2 CPUs=40 CoresPerSocket=10 Weight=20 > NodeName=gpu[38-39] Gres=gpu:titanxp:4 ThreadsPerCore=2 State=unknown > Sockets=2 CPUs=40 CoresPerSocket=10 Weight=1 > > > PartitionName=gpu_2h Nodes=gpu[36-39] Default=YES MaxTime=02:00:00 > DefaultTime=02:00:00 MaxNodes=1 State=UP AllowQos=GPU > PartitionName=gpu_8h Nodes=gpu[31-37] MaxTime=08:00:00 > DefaultTime=08:00:00 MaxNodes=1 State=UP AllowQos=GPU > > > # sinfo -N -O nodelist,partition,gres,weight > > > NODELIST PARTITION GRES WEIGHT > gpu36 gpu_2h* gpu:titanxp:4 20 > gpu36 gpu_8h gpu:titanxp:4 20 > gpu37 gpu_2h* gpu:titanxp:4 20 > gpu37 gpu_8h gpu:titanxp:4 20 > gpu38 gpu_2h* gpu:titanxp:4 1 > gpu39 gpu_2h* gpu:titanxp:4 1 >
You didn't mention the version of slurm you are using. Weights are known to be broken in early versions of 18.08. I think it was fixed in 18.08.04 put you'd have to go back and read the release message to confirm.