Re: [slurm-users] slurmd and dynamic nodes

Brian Andrus Fri, 23 Sep 2022 09:26:54 -0700

You shouldn't have to change any parameters if you have it configured inthe defaults. Just systemctl stop/start slurmd as needed.


something like:

scontrol update state=drain nodename=<node_to_change> reason="MIG reconfig"

<wait for it to be drained>

ssh <node_to_change> "systemctl stop slurmd"

<run reconfig stuff>

ssh <node_to_change> "systemctl start slurmd"

Not sure what would make you feel slurmd cannot run as a service on adynamic node. As long as you added the options to the systemd defaultsfile for it, you should be fine (usually /etc/defaults/slurmd)



Brian


On 9/23/2022 7:40 AM, Groner, Rob wrote:

Ya, we're still working out the mechanism for taking the node out,making the changes, and bringing it back. But the part I can't figureout is slurmd running on the remote node. What do I do with it? Do Irun it standalone, and when I need to reconfigure, I kill -9 it andexecute it again with the new configuration? Or what if slurmd isrunning as a service (as it does on all our non-dynamic nodes)? Do Istop it, change its service parameters and then restart it toreconfigure the node? The docs on slurm for dynamic nodes don't giveany indication of how you handle slurmd running on the dynamic node. What is the preferred method?
Rob

------------------------------------------------------------------------
*From:* slurm-users <slurm-users-boun...@lists.schedmd.com> on behalfof Brian Andrus <toomuc...@gmail.com>
*Sent:* Friday, September 23, 2022 10:24 AM
*To:* slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
*Subject:* Re: [slurm-users] slurmd and dynamic nodes

        
You don't often get email from toomuc...@gmail.com. Learn why this isimportant <https://aka.ms/LearnAboutSenderIdentification>
        


Just off the top of my head here.
I would expect you need to have no jobs currently running on the node,so you could could submit a job to the node that sets the node todrain, does any local things needed, then exits. As part of theEpilogSlurmctld script, you could check for drained nodes based onsome reason (like 'MIG reconfig') and do the head node steps there,with a final bit of bringing it back online.
Or just do all those steps from a script outside slurm itself, on thehead node. You can use ssh/pdsh to connect to a node and executethings there while it is out of the mix.
Brian Andrus


On 9/23/2022 7:09 AM, Groner, Rob wrote:
I'm working through how to use the new dynamic node features in orderto take down a particular node, reconfigure it (using nvidia MIG tochange the number of graphic cores available) and give it back to slurm.
I'm at the point where I can take a node out of slurm's control fromthe master node (scontrol delete nodename....), make the nvidia-smichange, and then execute slurmd on the node with the changedconfiguration parameters. It then does show up again in the sinfooutput on the master node, with the correct new resources.
What I'm not sure about is...when I want to reconfigure the dynamicnode AGAIN, how do I do that on the target node? I can use "scontroldelete" again on the scheduler node, but on the dynamic node, slurmdwill still be running. Currently, for testing purposes, I just findthe process ID and kill -9 it. Then I change the node configurationand execute "slurmd -Z --conf=...." again.
Is there a more elegant way to change the configuration on thedynamic node than by killing the existing slurmd process and startingit again?
I'll note that I tried doing everything from the master (slurmctld)node, since there is an option of creating the node there with"scontrol create" instead of using slurmd on the dynamic node. Butwhen i tried that, the dynamic node I created showed up in sinfooutput with a ~ next to it (powered off). The dynamic node docs pageonline did not mention what, if anything, slurmd was supposed to berunning as on the dynamic node if attempting to handle delete andcreate only on the master node.
Thanks.

Rob

Re: [slurm-users] slurmd and dynamic nodes

Reply via email to