Hi Kevin,

I have no experience with version 20 of slurm, but probably you have some misconfiguration.
Have you changed any settings in your slurm.conf file after the upgrade?
Dive into the documentation and verify if there aren't any changes to some of the directives within the slurm.conf.
On 05/11/2020 02:00, Kevin Buckley wrote:

We have had a couple of nodes enter a DRAINED state where scontrol
gives the reason as

Reason=slurm.conf

In looking at the SlurmCtlD log we see pairs of lines as follows

 update_node: node nid00245 reason set to: slurm.conf
 update_node: node nid00245 state set to DRAINED

A search of the interweb thing for "Reason=slurm.conf" and
"reason set to: slurm.conf" has so far proved too much for
my search-fu.

The slurm.conf files do match across the Cray nodes so it
seems unlikely that it would be a "something out of sync "
thing.

Other "update_node" actions seen in the logs , eg,

"reason set to: NHC-Admindown"

also put the node into a DRAINED state but then, that's to
be expected.

We upgraded to 20.02.5 a couple of days ago and all of the
occurences of the "reason set to: slurm.conf" message only
appear after the update.


Anyone else seen anything like this, espcially any of you who
have just gone 20.02.5?

Yours, diving into the source soon,
Kevin
--

*Cumprimentos / Best Regards,*

Zacarias Benta
INCD @ LIP - Universidade do Minho

INCD Logo

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to