> On Jan 18, 2019, at 11:53 AM, Kilian Cavalotti > <kilian.cavalotti.w...@gmail.com> wrote: > > On Fri, Jan 18, 2019 at 6:31 AM Prentice Bisbal <pbis...@pppl.gov> wrote: >>> Note that if you care about node weights (eg. NodeName=whatever001 >>> Weight=2, etc. in slurm.conf), using the topology function will disable it. >>> I believe I was promised a warning about that in the future in a >>> conversation with SchedMD. >> >> Well, that's going to be a big problem for me. One of the goals of me >> overhauling our Slurm config is to take advantage of the node weighting >> function to prioritize certain hardware over others in our very >> heterogeneous cluster. > > I've heard that too (that enabling the Topology plugin would disable > node weighting), but I don't think it's accurate, both from the > documentation and from observation. > > The doc actually says (https://slurm.schedmd.com/topology.html) > > """ > NOTE:Slurm first identifies the network switches which provide the > best fit for pending jobs and then selectes the nodes with the lowest > "weight" within those switches. If optimizing resource selection by > node weight is more important than optimizing network topology then do > NOT use the topology/tree plugin. > """ > > So the Topology plugin does take precedence over the weighting > algorithm, but it doesn't disable it, AFAIK. And for sites using > disjoint networks, as we do, this is a sane behavior.
I’m not sure if that’s a change, or whether that was always the behavior, but as a practical matter, it still really defeats the node weight. We have a fully defined topology for two different clusters, and it happens that the switch with the smallest number of connected nodes has the most specialized equipment (usually the login nodes, a couple of high memory nodes, and a few CUDA nodes). If someone runs a single node job, the job will favor that switch. I can think of a few ways to work around that, I guess, but by default, the behavior seems to be roughly the inverse of the node weights. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `'
signature.asc
Description: Message signed with OpenPGP