Dear everyone, Greetings!!!!
Answer to my post: Actually slurmctld uses best-fit approach on the available resources on each node. It does not obey our specified cpu map mask to assign task to the logical CPUs. I have added/Modified code to fulfil my requirement. Here is my experiment result. *$ srun -n 4 --cpu_bind=verbose,map_cpu:0,1,8,9 --distribution=block:block --mem=1024 sleep 10* cpu_bind=MAP - clusterhost1, task 0 0 [3334]: mask 0x1 set cpu_bind=MAP - clusterhost1, task 1 1 [3335]: mask 0x2 set cpu_bind=MAP - clusterhost1, task 2 2 [3336]: mask 0x100 set cpu_bind=MAP - clusterhost1, task 3 3 [3337]: mask 0x200 set *$ srun -n 16 --cpu_bind=verbose,map_cpu:0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23 --distribution=block:block --mem=1024 sleep 10* cpu_bind=MAP - clusterhost1, task 0 0 [3084]: mask 0x1 set cpu_bind=MAP - clusterhost1, task 1 1 [3085]: mask 0x2 set cpu_bind=MAP - clusterhost1, task 2 2 [3086]: mask 0x4 set cpu_bind=MAP - clusterhost1, task 3 3 [3087]: mask 0x8 set cpu_bind=MAP - clusterhost1, task 12 12 [3097]: mask 0x100000 set cpu_bind=MAP - clusterhost1, task 15 15 [3100]: mask 0x800000 set cpu_bind=MAP - clusterhost1, task 9 9 [3094]: mask 0x20000 set cpu_bind=MAP - clusterhost1, task 11 11 [3096]: mask 0x80000 set cpu_bind=MAP - clusterhost1, task 4 4 [3088]: mask 0x10 set cpu_bind=MAP - clusterhost1, task 10 10 [3095]: mask 0x40000 set cpu_bind=MAP - clusterhost1, task 13 13 [3098]: mask 0x200000 set cpu_bind=MAP - clusterhost1, task 5 5 [3089]: mask 0x20 set cpu_bind=MAP - clusterhost1, task 7 7 [3091]: mask 0x80 set cpu_bind=MAP - clusterhost1, task 14 14 [3099]: mask 0x400000 set cpu_bind=MAP - clusterhost1, task 8 8 [3093]: mask 0x10000 set cpu_bind=MAP - clusterhost1, task 6 6 [3090]: mask 0x40 set *$ srun -n 16 --cpu_bind=verbose,map_cpu:8,9,10,11,12,13,14,15,24,25,26,27,28,29,30,31 --distribution=block:block --mem=1024 sleep 10* cpu_bind=MAP - clusterhost1, task 2 2 [3157]: mask 0x400 set cpu_bind=MAP - clusterhost1, task 0 0 [3155]: mask 0x100 set cpu_bind=MAP - clusterhost1, task 1 1 [3156]: mask 0x200 set cpu_bind=MAP - clusterhost1, task 3 3 [3158]: mask 0x800 set cpu_bind=MAP - clusterhost1, task 4 4 [3159]: mask 0x1000 set cpu_bind=MAP - clusterhost1, task 5 5 [3160]: mask 0x2000 set cpu_bind=MAP - clusterhost1, task 6 6 [3161]: mask 0x4000 set cpu_bind=MAP - clusterhost1, task 14 14 [3169]: mask 0x40000000 set cpu_bind=MAP - clusterhost1, task 7 7 [3162]: mask 0x8000 set cpu_bind=MAP - clusterhost1, task 13 13 [3168]: mask 0x20000000 set cpu_bind=MAP - clusterhost1, task 12 12 [3167]: mask 0x10000000 set cpu_bind=MAP - clusterhost1, task 8 8 [3163]: mask 0x1000000 set cpu_bind=MAP - clusterhost1, task 9 9 [3164]: mask 0x2000000 set cpu_bind=MAP - clusterhost1, task 15 15 [3170]: mask 0x80000000 set cpu_bind=MAP - clusterhost1, task 10 10 [3165]: mask 0x4000000 set cpu_bind=MAP - clusterhost1, task 11 11 [3166]: mask 0x8000000 set *$ srun -n 16 --cpu_bind=verbose --mem=1024 sleep 10*cpu_bind=MASK - clusterhost1, task 2 2 [3207]: mask 0x2 set cpu_bind=MASK - clusterhost1, task 15 15 [3220]: mask 0x800000 set cpu_bind=MASK - clusterhost1, task 3 3 [3208]: mask 0x20000 set cpu_bind=MASK - clusterhost1, task 4 4 [3209]: mask 0x4 set cpu_bind=MASK - clusterhost1, task 10 10 [3215]: mask 0x20 set cpu_bind=MASK - clusterhost1, task 11 11 [3216]: mask 0x200000 set cpu_bind=MASK - clusterhost1, task 12 12 [3217]: mask 0x40 set cpu_bind=MASK - clusterhost1, task 13 13 [3218]: mask 0x400000 set cpu_bind=MASK - clusterhost1, task 14 14 [3219]: mask 0x80 set cpu_bind=MASK - clusterhost1, task 1 1 [3206]: mask 0x10000 set cpu_bind=MASK - clusterhost1, task 5 5 [3210]: mask 0x40000 set cpu_bind=MASK - clusterhost1, task 0 0 [3205]: mask 0x1 set cpu_bind=MASK - clusterhost1, task 6 6 [3211]: mask 0x8 set cpu_bind=MASK - clusterhost1, task 7 7 [3212]: mask 0x80000 set cpu_bind=MASK - clusterhost1, task 9 9 [3214]: mask 0x100000 set cpu_bind=MASK - clusterhost1, task 8 8 [3213]: mask 0x10 set *$ srun -n 4 --cpu_bind=verbose --mem=1024 sleep 10* cpu_bind=MASK - clusterhost1, task 3 3 [3266]: mask 0x20000 set cpu_bind=MASK - clusterhost1, task 2 2 [3265]: mask 0x2 set cpu_bind=MASK - clusterhost1, task 0 0 [3263]: mask 0x1 set cpu_bind=MASK - clusterhost1, task 1 1 [3264]: mask 0x10000 set On Fri, Oct 27, 2017 at 1:23 PM, Animesh Kuity <animesh2ku...@gmail.com> wrote: > Hi everyone, > > My objective: I want to assign few tasks to the logical CPUs belong to a > particular socket(e.g., say socket 0) and at other time, I want to assign > another set of tasks to the logical CPUs belongs to another socket (e.g., > say socket 0). In summary, I want to achieve task affinity to a particular > logical CPU > > slurm version used: slurm 16.05.10-2 > > slurm.conf to achieve task affinity: > > SelectType=select/cons_res > SelectTypeParameters=CR_Core > TaskPlugin=task/affinity > TaskPluginParam=sched > > Node used: Xeon processor; two sockets each having 8 cores with 2 > threads/core > > Processor layout(/proc/cpuinfo): > processor physical id core id > 0,16 0 0 > 1,17 0 1 > 2,18 0 2 > 3,19 0 3 > 4,20 0 4 > 5,21 0 5 > 6,22 0 6 > 7,23 0 7 > 8,24 1 0 > 9,25 1 1 > 10,26 1 2 > 11,27 1 3 > 12,28 1 4 > 13,29 1 5 > 14,30 1 6 > 15,31 1 7 > > Question: *I am unable to assign all the tasks to the particular logical > CPUs belong to socket 0/ Socket 1 * > > The tasks are always assigning to the sockets 0 first irrespective of the > specified map_cpu before going to socket 1 > > *My observation:* > > *$ srun -n 8 --cpu_bind=verbose,map_cpu:0,1,2,3,16,17,18,19 > --distribution=block:block --mem=1024 sleep 100 &* > [1] 14665 > cpu_bind=MASK - clusterhost1, task 0 0 [14697]: mask 0xf000f set > cpu_bind=MASK - clusterhost1, task 1 1 [14698]: mask 0xf000f set > cpu_bind=MASK - clusterhost1, task 4 4 [14701]: mask 0xf000f set > cpu_bind=MASK - clusterhost1, task 2 2 [14699]: mask 0xf000f set > cpu_bind=MASK - clusterhost1, task 3 3 [14700]: mask 0xf000f set > cpu_bind=MASK - clusterhost1, task 5 5 [14702]: mask 0xf000f set > cpu_bind=MASK - clusterhost1, task 6 6 [14703]: mask 0xf000f set > cpu_bind=MASK - clusterhost1, task 7 7 [14704]: mask 0xf000f set > *$ srun bash -c "cat /proc/self/status | grep Cpus_allowed_list"* > Cpus_allowed_list: 4,20 > > > *$ srun -n 8 --cpu_bind=verbose,map_cpu:0,1,2,3,4,5,6,7 > --distribution=block:block --mem=1024 sleep 100 &* > [1] 14814 > cpu_bind=MASK - clusterhost1, task 1 1 [14847]: mask 0xf000f set > cpu_bind=MASK - clusterhost1, task 2 2 [14848]: mask 0xf000f set > cpu_bind=MASK - clusterhost1, task 3 3 [14849]: mask 0xf000f set > cpu_bind=MASK - clusterhost1, task 0 0 [14846]: mask 0xf000f set > cpu_bind=MASK - clusterhost1, task 5 5 [14851]: mask 0xf000f set > cpu_bind=MASK - clusterhost1, task 6 6 [14852]: mask 0xf000f set > cpu_bind=MASK - clusterhost1, task 4 4 [14850]: mask 0xf000f set > cpu_bind=MASK - clusterhost1, task 7 7 [14853]: mask 0xf000f set > *$ srun bash -c "cat /proc/self/status | grep Cpus_allowed_list"* > Cpus_allowed_list: 4,20 > > *$ srun -n 20 > --cpu_bind=verbose,map_cpu:0,1,2,3,4,5,6,7,9,10,11,12,13,14,15,16,17,18,19 > --distribution=block:block --mem=1024 sleep 100 &* > [1] 15688 > cpu_bind=MASK - clusterhost1, task 1 1 [15721]: mask 0x3ff03ff set > cpu_bind=MASK - clusterhost1, task 2 2 [15722]: mask 0x3ff03ff set > cpu_bind=MASK - clusterhost1, task 4 4 [15724]: mask 0x3ff03ff set > cpu_bind=MASK - clusterhost1, task 5 5 [15725]: mask 0x3ff03ff set > cpu_bind=MASK - clusterhost1, task 7 7 [15727]: mask 0x3ff03ff set > cpu_bind=MASK - clusterhost1, task 0 0 [15720]: mask 0x3ff03ff set > cpu_bind=MASK - clusterhost1, task 6 6 [15726]: mask 0x3ff03ff set > cpu_bind=MASK - clusterhost1, task 3 3 [15723]: mask 0x3ff03ff set > cpu_bind=MASK - clusterhost1, task 10 10 [15730]: mask 0x3ff03ff set > cpu_bind=MASK - clusterhost1, task 8 8 [15728]: mask 0x3ff03ff set > cpu_bind=MASK - clusterhost1, task 9 9 [15729]: mask 0x3ff03ff set > cpu_bind=MASK - clusterhost1, task 11 11 [15731]: mask 0x3ff03ff set > cpu_bind=MASK - clusterhost1, task 12 12 [15732]: mask 0x3ff03ff set > cpu_bind=MASK - clusterhost1, task 14 14 [15734]: mask 0x3ff03ff set > cpu_bind=MASK - clusterhost1, task 13 13 [15733]: mask 0x3ff03ff set > cpu_bind=MASK - clusterhost1, task 15 15 [15735]: mask 0x3ff03ff set > cpu_bind=MASK - clusterhost1, task 16 16 [15736]: mask 0x3ff03ff set > cpu_bind=MASK - clusterhost1, task 17 17 [15737]: mask 0x3ff03ff set > cpu_bind=MASK - clusterhost1, task 18 18 [15738]: mask 0x3ff03ff set > cpu_bind=MASK - clusterhost1, task 19 19 [15739]: mask 0x3ff03ff set > *$ srun bash -c "cat /proc/self/status | grep Cpus_allowed_list"* > Cpus_allowed_list: 10,26 > > *$ srun -n 8 --cpu_bind=verbose,map_cpu:8,9,10,11,24,25,26,27 > --distribution=block:block --mem=1024 sleep 100 &* > [1] 16816 > cpu_bind=MASK - clusterhost1, task 1 1 [16850]: mask 0xf000f set > cpu_bind=MASK - clusterhost1, task 4 4 [16853]: mask 0xf000f set > cpu_bind=MASK - clusterhost1, task 3 3 [16852]: mask 0xf000f set > cpu_bind=MASK - clusterhost1, task 2 2 [16851]: mask 0xf000f set > cpu_bind=MASK - clusterhost1, task 0 0 [16849]: mask 0xf000f set > cpu_bind=MASK - clusterhost1, task 6 6 [16855]: mask 0xf000f set > cpu_bind=MASK - clusterhost1, task 5 5 [16854]: mask 0xf000f set > cpu_bind=MASK - clusterhost1, task 7 7 [16856]: mask 0xf000f set > > *$ srun bash -c "cat /proc/self/status | grep Cpus_allowed_list"* > Cpus_allowed_list: 4,20 > > *$ srun --nodes=1 --ntasks=32 --cpu_bind=cores,verbose --label cat > /proc/self/status | grep Cpus_allowed_list* > 00: cpu_bind=MASK - clusterhost1, task 0 0 [13955]: mask 0x10001 set > 01: cpu_bind=MASK - clusterhost1, task 1 1 [13956]: mask 0x20002 set > 04: cpu_bind=MASK - clusterhost1, task 4 4 [13959]: mask 0x100010 set > 05: cpu_bind=MASK - clusterhost1, task 5 5 [13960]: mask 0x200020 set > 06: cpu_bind=MASK - clusterhost1, task 6 6 [13961]: mask 0x400040 set > 03: cpu_bind=MASK - clusterhost1, task 3 3 [13958]: mask 0x80008 set > 02: cpu_bind=MASK - clusterhost1, task 2 2 [13957]: mask 0x40004 set > 09: cpu_bind=MASK - clusterhost1, task 9 9 [13964]: mask 0x2000200 set > 07: cpu_bind=MASK - clusterhost1, task 7 7 [13962]: mask 0x800080 set > 10: cpu_bind=MASK - clusterhost1, task 10 10 [13965]: mask 0x4000400 set > 11: cpu_bind=MASK - clusterhost1, task 11 11 [13966]: mask 0x8000800 set > 14: cpu_bind=MASK - clusterhost1, task 14 14 [13969]: mask 0x40004000 set > 15: cpu_bind=MASK - clusterhost1, task 15 15 [13970]: mask 0x80008000 set > 12: cpu_bind=MASK - clusterhost1, task 12 12 [13967]: mask 0x10001000 set > 13: cpu_bind=MASK - clusterhost1, task 13 13 [13968]: mask 0x20002000 set > 08: cpu_bind=MASK - clusterhost1, task 8 8 [13963]: mask 0x1000100 set > 17: cpu_bind=MASK - clusterhost1, task 17 17 [13972]: mask 0x20002 set > 16: cpu_bind=MASK - clusterhost1, task 16 16 [13971]: mask 0x10001 set > 20: cpu_bind=MASK - clusterhost1, task 20 20 [13975]: mask 0x100010 set > 19: cpu_bind=MASK - clusterhost1, task 19 19 [13974]: mask 0x80008 set > 18: cpu_bind=MASK - clusterhost1, task 18 18 [13973]: mask 0x40004 set > 22: cpu_bind=MASK - clusterhost1, task 22 22 [13977]: mask 0x400040 set > 21: cpu_bind=MASK - clusterhost1, task 21 21 [13976]: mask 0x200020 set > 24: cpu_bind=MASK - clusterhost1, task 24 24 [13979]: mask 0x1000100 set > 25: cpu_bind=MASK - clusterhost1, task 25 25 [13980]: mask 0x2000200 set > 23: cpu_bind=MASK - clusterhost1, task 23 23 [13978]: mask 0x800080 set > 26: cpu_bind=MASK - clusterhost1, task 26 26 [13981]: mask 0x4000400 set > 30: cpu_bind=MASK - clusterhost1, task 30 30 [13985]: mask 0x40004000 set > 31: cpu_bind=MASK - clusterhost1, task 31 31 [13986]: mask 0x80008000 set > 28: cpu_bind=MASK - clusterhost1, task 28 28 [13983]: mask 0x10001000 set > 29: cpu_bind=MASK - clusterhost1, task 29 29 [13984]: mask 0x20002000 set > 27: cpu_bind=MASK - clusterhost1, task 27 27 [13982]: mask 0x8000800 set > 03: Cpus_allowed_list: 3,19 > 04: Cpus_allowed_list: 4,20 > 01: Cpus_allowed_list: 1,17 > 06: Cpus_allowed_list: 6,22 > 00: Cpus_allowed_list: 0,16 > 02: Cpus_allowed_list: 2,18 > 05: Cpus_allowed_list: 5,21 > 09: Cpus_allowed_list: 9,25 > 10: Cpus_allowed_list: 10,26 > 14: Cpus_allowed_list: 14,30 > 11: Cpus_allowed_list: 11,27 > 15: Cpus_allowed_list: 15,31 > 12: Cpus_allowed_list: 12,28 > 13: Cpus_allowed_list: 13,29 > 17: Cpus_allowed_list: 1,17 > 07: Cpus_allowed_list: 7,23 > 16: Cpus_allowed_list: 0,16 > 08: Cpus_allowed_list: 8,24 > 20: Cpus_allowed_list: 4,20 > 19: Cpus_allowed_list: 3,19 > 18: Cpus_allowed_list: 2,18 > 21: Cpus_allowed_list: 5,21 > 22: Cpus_allowed_list: 6,22 > 24: Cpus_allowed_list: 8,24 > 23: Cpus_allowed_list: 7,23 > 26: Cpus_allowed_list: 10,26 > 30: Cpus_allowed_list: 14,30 > 31: Cpus_allowed_list: 15,31 > 25: Cpus_allowed_list: 9,25 > 28: Cpus_allowed_list: 12,28 > 29: Cpus_allowed_list: 13,29 > 27: Cpus_allowed_list: 11,27 > > > *Kindly help me to assign all the tasks to either socket.* > > Any kind of help will be appreciated. > > Thanks in advance. > > -- > Thanks & Regards, > Animesh Kuity, > Research Scholar, > Computer Science department, > IIT Roorkee > -- Thanks & Regards, Animesh Kuity, Research Scholar, Computer Science department, IIT Roorkee