Re: [slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-04-03 Thread Dr. Thomas Orgis
Am Wed, 29 Mar 2023 15:51:51 +0200 schrieb Ole Holm Nielsen : > As for job scheduling, slurmctld may allocate a job to some powered-off > nodes and then calls the ResumeProgram defined in slurm.conf. From this > point it may indeed take 2-3 minutes before a node is up and running > slurmd, dur

[slurm-users] Job killed for unknown reason

2023-04-03 Thread Robert Barton
Hello, I'm looking for help in understanding a problem we're having such that Slurm indicates that a job was killed, but not why. It's not clear what's actually killing the jobs; we've seen jobs killed for time limits and out-of-memory issues, and those reasons are obvious in the logs when th