In my experience, it may say that even if it has nothing to do with the reason 
the job isn’t running, if there are nodes on the system that aren’t available.

I assume you’ve checked for reservations?

> On May 7, 2018, at 5:06 PM, Prentice Bisbal <pbis...@pppl.gov> wrote:
> 
> Dear Slurm Users,
> 
> On my cluster, I have several partitions, each with their own QOS, time 
> limits, etc.
> 
> Several times today, I've received complaints from users that they submitted 
> jobs to a partition with available nodes, but jobs are stuck in the PD state. 
> I have spent the majority of my day investigating this, but haven't turned up 
> anything meaningful. Both jobs show the "ReqNodeNotAvail" reason, but none of 
> the nodes listed at not available are even in the partition these jobs are 
> submitted to. Neither job has requested a specific node, either.
> 
> I have checked slurmctld.log on the server, and have not been able to find 
> any clues. Any where else I should look? Any ideas what could be causing this?

--
____
|| \\UTGERS,     |---------------------------*O*---------------------------
||_// the State  |         Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ  | Office of Advanced Research Computing - MSB C630, Newark
     `'

Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to