Hello,
I'm trying to use the topology/tree plugin to isolate nodes in different
"groups", so that jobs can be allocated only on nodes belonging to one
such group, and not in nodes from other groups. I think I'm missing
something, because Slurm doesn't seem to take this topology into
consideration. Hopefully someone can spot what I'm doing wrong. So far
I'm trying to separate a 3 node cluster into two groups, using two
switches. I have created a topology.conf file, and placed it in the etc
directory of the slurm installation path. This file contains these two
lines:
SwitchName=s0 Nodes=node1,node2
SwitchName=s1 Nodes=node3
I also tried adding a third switch that contains the s0 and s1 switches,
but it didn't solve anything.
Then, I have enabled the use of the topology/tree plugin with this line
in slurm.conf:
TopologyPlugin=topology/tree
And finally these changes are taken into consideration with:
scontrol reconfigure
Then, I would expect that launching a job that requires 2 nodes would be
run on node1 and node2 only, since these two shoud be grouped under the
switch "s0", but it runs on node1 and node3, ignoring the topology, when
I send a command like this:
sbatch -A molecules_serv -p cc -N 2 -n 4 --switches=1 ./script1
Maybe I'm not understanding correctly what the --switches flag does, but
I think it should only consider nodes that are under 1 switch, and among
those, use those that can at the same time fullfill the other
requirements, like the number of nodes or tasks. Therefore I would
expect it to only run the job in node1 and node2 in parallel, but not
node3.
Any ideas?
Thank you very much for your help
Antonio