and it looks like i'll have to wait till 20.11 for a fix
https://bugs.schedmd.com/show_bug.cgi?id=9035
On Wed, Aug 26, 2020 at 11:20 AM Michael Di Domenico
wrote:
>
> looks like a similar issue is being tracked by:
> https://bugs.schedmd.com/show_bug.cgi?id=9441
>
> On Wed, Aug 26, 2020 at 11:04
looks like a similar issue is being tracked by:
https://bugs.schedmd.com/show_bug.cgi?id=9441
On Wed, Aug 26, 2020 at 11:04 AM Michael Di Domenico
wrote:
>
> sorry i meant to say, our slurm nodehealth script pushed the node to
> failed state. slurm itself wasn't doing this
>
> On Wed, Aug 26, 20
sorry i meant to say, our slurm nodehealth script pushed the node to
failed state. slurm itself wasn't doing this
On Wed, Aug 26, 2020 at 11:02 AM Michael Di Domenico
wrote:
>
> i just upgraded from v18 to v20. Did something change in the node
> config validation? it used to be that if i start
i just upgraded from v18 to v20. Did something change in the node
config validation? it used to be that if i started slurm on a compute
node that had lower than expected memory or was missing gpu's, slurm
would push a node into a failed state that i could see in sinfo -R.
now it seems to be loggi