i just upgraded from v18 to v20. Did something change in the node config validation? it used to be that if i started slurm on a compute node that had lower than expected memory or was missing gpu's, slurm would push a node into a failed state that i could see in sinfo -R. now it seems to be logging every second in the slurmctld "slurm_rpc_node_registration invalid argument" log file for each node that's broken
Is there some function that got disabled/changed? i use slurm to ferret out bad hardware, but logging to the logfile every seconds seems silly and since i don't routinely watch the log files things will go unnoticed