On 20/6/19 3:24 am, Brian Andrus wrote:
Can you give the exact command/output you have from this?
I suspect a typo in your slurm.conf for nodenames or what you are typing.
Brian Andrus
Hi Brian,
I am pretty sure there is no error in my typing of the commands, but
just in case find below t
On 6/18/19 11:29 PM, nathan norton wrote:
Without knowing the internals of slurm it feels like nodes that are
turned off+cloud state don't exist in the system until they are on?
Not quite, they exist internally but are not exposed until in use:
https://slurm.schedmd.com/elastic_computing.html
Can you give the exact command/output you have from this?
I suspect a typo in your slurm.conf for nodenames or what you are typing.
Brian Andrus
On 6/18/2019 11:29 PM, nathan norton wrote:
Hi,
It just shows
"Node $NODE not found"
Whereas others all work as expected (ie, they are running)
W
Hi,
It just shows
"Node $NODE not found"
Whereas others all work as expected (ie, they are running)
Without knowing the internals of slurm it feels like nodes that are turned
off+cloud state don't exist in the system until they are on?
Any other ideas?
Thanks
Nathan
On Wed., 19 Jun. 2019, 4:
On Tuesday, 18 June 2019 9:36:56 PM PDT nathan norton wrote:
> Just tried running that command, but it only shows nodes that are up and
> running, doesn’t tell me about any nodes that are down and turned off, as
> an example please see below. There is a job running that should be using
> the 100 n
lure look in the slurmd or slurmctld logs.
>
> ---
> Sam Gallop
>
> -Original Message-
> From: slurm-users On Behalf Of
> nathan norton
> Sent: 18 June 2019 09:33
> To: slurm-users@lists.schedmd.com
> Subject: [slurm-users] status of cloud nodes
>
> Hi all,
>
ogs.
---
Sam Gallop
-Original Message-
From: slurm-users On Behalf Of nathan
norton
Sent: 18 June 2019 09:33
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] status of cloud nodes
Hi all,
I am using slurm with a cloud provider it is all working a treat.
lets say i have 100 nodes
Hi all,
I am using slurm with a cloud provider it is all working a treat.
lets say i have 100 nodes all working fine and able to be scheduled,
everything works fine.
$ srun -N100 hostname
works fine.
For some unknown reason after machines shut down for example over the
weekend if no jobs g