I have no experience with this, but based on my understanding of the doc, the 
shutdown command should be something like "ssh ${node} systemctl shutdown", and 
the resume "ipmitool -I lan -H ${node}-bmc -U <User ID> -f password_file.txt 
chassis power on ".
If you use libvirt for your virtual cluster, you can test waking nodes up via 
ipmi using virtual bmc https://github.com/openstack/virtualbmc (the doc is a 
bit terse unfortunately).

Best,
Nicolas Granger

Le jeudi 28 juillet 2022 à 11:49 -0400, Djamil Lakhdar-Hamina a écrit :
I am helping set up a 16 node cluster computing system, I am not a system-admin 
but I work for a small firm and unfortunately have to pick up needed skills 
fast in things I have little experience in. I am running Rocky Linux 8 on Intel 
Xeon Knights Landings nodes donated by the TAAC center. We are operating in 
Uganda where we have limited resources and where power is quite expensive.

What are some good ways to implement power-saving ? I have already tried power 
saving as per slurms power saving guide but 1) I am not quite sure what it does 
and 2) in implementing a version on my virtual dev environment I was able to 
get the power saving to stand down nodes, but I was not able to get the power 
saving mechanism to spin them back up when needed. I put power saving in the 
slurm.cfg file, and I also specified a SuspendProgram and a ResumeProgram 
similar to the one in the https://slurm.schedmd.com/power_save.html.

So 1) how do I get this power saving mechanism to work, what exactly will it 
do, I see it stands nodes down, will it spin them back up on request of those 
resources? 2) Are there any better techniques for power saving, say using 
IPMItool or something?

Sincerely,
Djamil Lakhdar-Hamina

Reply via email to