Hi David,
The topology.conf file groups nodes into sets such that parallel jobs
will not be scheduled by Slurm across disjoint sets. Even though the
topology.conf man-page refers to network switches, it's really about
topology rather than network.
You may use fake (non-existing) switch name
You can schedule jobs across the two racks, with any given job only using one
rack, by specifying
#SBATCH --partition rack1,rack2
It'll only use 1 partition, in order of priority (not liti
I never found a way for topology to do that - all I could get it to do is to
prefer to keep things within a
This should be fine assuming you don't mind the mismatch in CPU speeds.
Unless the codes are super sensitive to topology things should be okay
as modern IB is wicked fast.
In our environment here we have a variety of different hardware types
all networked together on the same IB fabric. Tha
Hello,
These days we have now enabled topology aware scheduling on our Slurm cluster.
One part of the cluster consists of two racks of AMD compute nodes. These racks
are, now, treated as separate entities by Slurm. Soon, we may add another set
of AMD nodes with slightly difference CPU specs to