On Fri, 2018-07-06 at 07:47:16 +0200, Loris Bennett wrote: > Hi Tim, > > Tim Lin <timty...@gmail.com> writes: > > > As the title suggests, I’m searching for a way to have tighter control of > > which > > node the batch script gets executed on. In my case it’s very hard to know > > which > > node is best for this until after all the nodes are allocate, right before > > the > > batch job starts . I’ve looked through all the documentation I can get my > > hands > > on but I haven’t found any mention of any control over the batch host for > > admins. Am I missing something? > > As the documentation of 'sbatch' says: > > "When the job allocation is finally granted for the batch script, > Slurm runs a single copy of the batch script on the first node in the > set of allocated nodes. " > > I am not aware of any way of changing this. > > Perhaps you can explain why you feel it is necessary for you do this.
For me, the above reads like the user has an idea of a metric for how to select the node for rank-0 (and perhaps the code is sufficiently asymmetric to justify such a selection), but no way to tell Slurm about it. What about making the batch script a wrapper around the real payload, on the "outer first node" take the list of assigned nodes and possibly reorder it, then run the payload (via passphrase-less ssh?) on the selected, "new first" node? This may require changing some more environment variables, and may harm signalling. Okay, my suggestion reads like a terrible kludge (which it certainly is), but AFAIK there's no way to tell Slurm about "preferred first nodes". - S -- Steffen Grunewald, Cluster Administrator Max Planck Institute for Gravitational Physics (Albert Einstein Institute) Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany ~~~ Fon: +49-331-567 7274 Mail: steffen.grunewald(at)aei.mpg.de ~~~