Thanks so much! Indeed it was a mismatch between the actual and slurmd.conf
SocketsPerBoard value.
Sushil
On Tue, Oct 11, 2022 at 11:25 AM Paul H. Hargrove
wrote:
> I think Rob is "on the right track" here. Specifically, I don't think the
> error message means that "RESUME" is unrecognized as t
I think Rob is "on the right track" here. Specifically, I don't think the
error message means that "RESUME" is unrecognized as the name of a state.
Rather the message means that a state transition from "INVAL" to "RESUME"
is invalid. I can reproduce that message by trying to "RESUME" an "IDLE"
no
Have you checked the logs for slurmd and slurmctld? I seem to recall that the
"invalid" state for a node meant that there was some discrepancy between what
the node says or thinks it has (slurmd -C) and what the slurm.conf says it has.
While there is that discrepancy and the node is invalid, y