Ryan,

Thanks for looking into this. I hadn't had a chance to revisit the documentation since posing my question. Thanks for doing that for me.

Prentice Bisbal
Lead Software Engineer
Princeton Plasma Physics Laboratory
http://www.pppl.gov

On 1/18/19 2:58 PM, Ryan Novosielski wrote:
The documentation indicates you need it everywhere:

https://slurm.schedmd.com/topology.conf.html

"Changes to the configuration file take effect upon restart of Slurm daemons, daemon receipt 
of the SIGHUP signal, or execution of the command "scontrol reconfigure" unless otherwise 
noted."

I have vague memories of not being able to schedule any jobs if it’s missing, 
but it’s been awhile now.

On Jan 17, 2019, at 4:52 PM, Prentice Bisbal <pbis...@pppl.gov> wrote:

And a follow-up question: Does topology.conf need to be on all the nodes, or 
just the slurm controller? It's not clear from that web page. I would assume 
only the controller needs it.

Prentice

On 1/17/19 4:49 PM, Prentice Bisbal wrote:
 From https://slurm.schedmd.com/topology.html:

Note that compute nodes on switches that lack a common parent switch can be used, but no 
job will span leaf switches without a common parent (unless the 
TopologyParam=TopoOptional option is used). For example, it is legal to remove the line 
"SwitchName=s4 Switches=s[0-3]" from the above topology.conf file. In that 
case, no job will span more than four compute nodes on any single leaf switch. This 
configuration can be useful if one wants to schedule multiple phyisical clusters as a 
single logical cluster under the control of a single slurmctld daemon.
My current environment falls into the category of multiple physical clusters 
being treated as a single logical cluster under the control of a single 
slurmctld daemon. At least, that's my goal.

In my environment, I have 2 "clusters" connected by their own separate IB fabrics, and 
one "cluster" connected with 10 GbE. I have a fourth cluster connected with only 1GbE. 
For this 4th cluster, we don't want jobs to span nodes, due to the slow performance of 1 GbE. (This 
cluster is intended for serial and low-core count parallel jobs) If I just leave those nodes out of 
the topology.conf file, will that have the desired affect of not allocating multi-node jobs to 
those nodes, or will it result in an error of some sort?
--
____
|| \\UTGERS,     |---------------------------*O*---------------------------
||_// the State  |         Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ  | Office of Advanced Research Computing - MSB C630, Newark
      `'


Reply via email to