All,

I have a cloud based cluster using slurm 19.05.0-1
I removed one of the partitions, but now everytime I start slurmctld I get
some errors:

slurmctld[63042]: error: Invalid partition (mpi-h44rs) for JobId=52545
slurmctld[63042]: error: _find_node_record(756): lookup failure for
mpi-h44rs-01
slurmctld[63042]: error: node_name2bitmap: invalid node specified
mpi-h44rs-01
.
.
slurmctld[63042]: error: _find_node_record(756): lookup failure for
mpi-h44rs-05
slurmctld[63042]: error: node_name2bitmap: invalid node specified
mpi-h44rs-05
slurmctld[63042]: error: Invalid nodes (mpi-h44rs-[01-05]) for JobId=52545

I suspect this is in the saved state directory and if I were to down the
entire cluster and delete those files up, it would clear it up, but I
prefer to not have to down the cluster...

Is there a way to clean up "phantom" nodes and partitions that were deleted?

Brian Andrus

Reply via email to