On 3/8/22 10:20 pm, Gerhard Strangar wrote:
With a fake license called reboot?
It's a neat idea, but I think there is a catch:
* 3 jobs start, each taking 1 license
* Other reboot jobs are all blocked
* Running reboot jobs trigger node reboot
* Running reboot jobs end when either the script e
Another way might be to implement slurm power off/on (if not already) and
induce it as required.
-
David Simpson - Senior Systems Engineer
ARCCA, Redwood Building,
King Edward VII Avenue,
Cardiff, CF10 3NB
This is actually brilliant!
Brian Andrus
On 8/3/2022 10:20 PM, Gerhard Strangar wrote:
Phil Chiu wrote:
- Individual slurm jobs which reboot nodes - With a for loop, I could
submit a reboot job for each node. But I'm not sure how to limit this so at
most N jobs are running simulta
...job dependencies are also an option, thinking about this. You could
carve it up into X 'sets' of N nodes, with node-specific reboot jobs
that depend on the previous job in the same 'N' to finish.
Tina
On 04/08/2022 11:23, Tina Friedrich wrote:
I'm thinking something like that currently - se
I'm thinking something like that currently - setting up some kind of
TRES resource that limits how many are rebooted at any one time.
I usually do this sort of thing more or less manually; as in, I
generated a list of sbatch commands with the reboot job (one job per
node, specifying node name)