Have you checked the logs for slurmd and slurmctld?  I seem to recall that the 
"invalid" state for a node meant that there was some discrepancy between what 
the node says or thinks it has (slurmd -C) and what the slurm.conf says it has. 
 While there is that discrepancy and the node is invalid, you can't just tell 
it to resume.

________________________________
From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Sushil 
Mishra <sushilbioi...@gmail.com>
Sent: Tuesday, October 11, 2022 10:08 AM
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: [slurm-users] slurm_update error: Invalid node state specified

You don't often get email from sushilbioi...@gmail.com. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
Dear all,

I am stuck with scontrol not recognizing the state keywords. I wonder if 
someone can point me to the possible cause of the error.  I restarted slurmd a 
few times, and it didn't help.

[sushil@fucose ~]$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
LocalQ*      up   infinite      1  inval fucose

[sushil@fucose ~]$ sinfo -R
REASON               USER      TIMESTAMP           NODELIST
cg                   sushil    2022-10-10T18:11:27 fucose

[sushil@fucose ~]$ sudo scontrol update NodeName=fucose state=RESUME
[sudo] password for sushil:
slurm_update error: Invalid node state specified

[sushil@fucose ~]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES 
NODELIST(REASON)

Best,
Sushil

Reply via email to