Re: [slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-04-03 Thread Dr. Thomas Orgis
Am Wed, 29 Mar 2023 15:51:51 +0200 schrieb Ole Holm Nielsen : > As for job scheduling, slurmctld may allocate a job to some powered-off > nodes and then calls the ResumeProgram defined in slurm.conf. From this > point it may indeed take 2-3 minutes before a node is up and running > slurmd, dur

Re: [slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-03-29 Thread Ole Holm Nielsen
Hi Thomas, I think the Slurm power_save is not problematic for us with bare-metal on-premise nodes, in contrast to the problems you're having. We use power_save with on-premise nodes where we control the power down/up by means of IPMI commands as provided in the scripts which you will find i

Re: [slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-03-29 Thread Dr. Thomas Orgis
Am Wed, 29 Mar 2023 14:42:33 +0200 schrieb Ben Polman : > I'd be interested in your kludge, we face a similar situation where the > slurmctld node > does not have access to the ipmi network and can not ssh to machines > that have access. > We are thinking on creating a rest interface to a contro

Re: [slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-03-29 Thread Ben Polman
I'd be interested in your kludge, we face a similar situation where the slurmctld node does not have access to the ipmi network and can not ssh to machines that have access. We are thinking on creating a rest interface to a control server which would be running the ipmi commands Ben On 29-

Re: [slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-03-29 Thread Dr. Thomas Orgis
Am Mon, 27 Mar 2023 13:17:01 +0200 schrieb Ole Holm Nielsen : > FYI: Slurm power_save works very well for us without the issues that you > describe below. We run Slurm 22.05.8, what's your version? I'm sure that there are setups where it works nicely;-) For us, it didn't, and I was faced with h

Re: [slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-03-27 Thread Ole Holm Nielsen
Hi Thomas, FYI: Slurm power_save works very well for us without the issues that you describe below. We run Slurm 22.05.8, what's your version? I've documented our setup in this Wiki page: https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_cloud_bursting/#configuring-slurm-conf-for-power-saving T

Re: [slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-03-27 Thread Dr. Thomas Orgis
Am Mon, 06 Mar 2023 13:35:38 +0100 schrieb Stefan Staeglich : > But this fixed not the main error but might have reduced the frequency of > occurring. Has someone observed similar issues? We will try a higher > SuspendTimeout. We had issues with power saving. We powered the idle nodes off, caus

[slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-03-06 Thread Stefan Staeglich
Hi, since a half year we using the suspend/resume support for Slurm. This works quite well but sometimes it breaks and no nodes are suspended or resumed anymore. In this case we see the following message in the log: error: power_save module disabled, NULL SuspendProgram A restart of slurmctld