...and... using the same cluster name is important in our scenario for the seamless slurmdbd upgrade transition.
In thinking about it a bit more, I'm not sure I'd want to fold together production and test/dev configs in the same revision control tree. We keep them separate. There's no reason to baroquify it. On Wed, Jan 4, 2023 at 1:54 PM Fulcomer, Samuel <samuel_fulco...@brown.edu> wrote: > Just make the cluster names the same, with different Nodename and > Partition lines. The rest of slurm.conf can be the same. Having two cluster > names is only necessary if you're running production in a multi-cluster > configuration. > > Our model has been to have a production cluster and a test cluster which > becomes the production cluster at yearly upgrade time (for us, next week). > The test cluster is also used for rebuilding MPI prior to the upgrade, when > the PMI changes. We force users to resubmit jobs at upgrade time (after the > maintenance reservation) to ensure that MPI runs correctly. > > > > On Wed, Jan 4, 2023 at 12:26 PM Groner, Rob <rug...@psu.edu> wrote: > >> We currently have a test cluster and a production cluster, all on the >> same network. We try things on the test cluster, and then we gather those >> changes and make a change to the production cluster. We're doing that >> through two different repos, but we'd like to have a single repo to make >> the transition from testing configs to publishing them more seamless. The >> problem is, of course, that the test cluster and production clusters have >> different cluster names, as well as different nodes within them. >> >> Using the include directive, I can pull all of the NodeName lines out of >> slurm.conf and put them into %c-nodes.conf files, one for production, one >> for test. That still leaves me with two problems: >> >> - The clustername itself will still be a problem. I WANT the same >> slurm.conf file between test and production...but the clustername line >> will >> be different for them both. Can I use an env var in that cluster name, >> because on production there could be a different env var value than on >> test? >> - The gres.conf file. I tried using the same "include" trick that >> works on slurm.conf, but it failed because it did not know what the >> "ClusterName" was. I think that means that either it doesn't work for >> anything other than slurm.conf, or that the clustername will have to be >> defined in gres.conf as well? >> >> Any other suggestions of how to keep our slurm files in a single source >> control repo, but still have the flexibility to have them run elegantly on >> either test or production systems? >> >> Thanks. >> >>