Hi Fabio,

My guess is that you can (partly) solve this by using the correct state
in slurm.conf. Either CLOUD or FUTURE might be what you're looking for.
See `man slum.conf`.

Kind regards,

Martijn Kruiten

On Fri, 2019-05-17 at 09:17 +0000, Verzelloni  Fabio wrote:
> Hello,
> I have a question related to the cloud feature or a feature that can
> solve an issue that I have with my cluster,to make it simple let say
> that I have a set of nodes ( let say 10 nodes ), if needed I move
> node/s from cluster A to cluster B and in my slurm.conf I define all
> the possible number of available nodes:
> 
> Cluster A
> NodeName=clusterA-[001-010]
> 
> Cluster B
> NodeName=clusterB-[001-010]
> 
> In normal operation I have 5 nodes in 'cluster A' and 5 in 'cluster
> B', but in case of needs I reboot a node of 'cluster B' in 'cluster
> A', and the result will be 4 nodes in 'cluster B' and 6 in 'cluster
> A'.
> The "issue" is that since I specified all possible nodes in
> slurm.conf, when I ran sinfo what I see is:
> 
> Cluster A
> Normal up 1-00:00:00 5 up clusterA-[01-05]
> Normal up 1-00:00:00 5 down* clusterA-[06-10]
>  
> Cluster B
> Normal up 1-00:00:00 5 up clusterB-[06-10]
> Normal up 1-00:00:00 5 down* clusterB-[01-5]
> 
> And in both slurmctld.log I have the message:
> 
> error: Unable to resolve "clusterA-006": Unknown host
> 
> or 
> 
> error: Unable to resolve "clusterB-001": Unknown host
> 
> Since I have a lot of partitions and a lot of nodes, the sinfo it is
> much more complicated to read due to the DOWN nodes that are actually
> not present in the system, is there a way/feature/option that wont
> display in the sinfo nodes that are actually NOT present and
> reachable by the slurmctld due to the  "error: Unable to resolve
> "clusterA-006": Unknown host " ?
> 
> Basically I'd like to have in both slurm.conf all the possible nodes
> but the sinfo should shows:
> 
> Cluster A
> Normal up 1-00:00:00 5 up clusterA-[01-05]
> 
> Cluster B
> Normal up 1-00:00:00 5 up clusterB-[06-10]
> 
> And If I move a node once the node is actually reachable:
> 
> Cluster A
> Normal up 1-00:00:00 6 up clusterA-[01-06]
> 
> Cluster B
> Normal up 1-00:00:00 4 up clusterB-[07-10]
> 
> Thanks
> Fabio
> 
> --
> - Fabio Verzelloni - CSCS - Swiss National Supercomputing Centre
> via Trevano 131 - 6900 Lugano, Switzerland
> Tel: +41 (0)91 610 82 04
>  
> 
-- 
| System Programmer | SURFsara | Science Park 140 | 1098 XG Amsterdam |
| T +31 6 20043417  | martijn.krui...@surfsara.nl | www.surfsara.nl |

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to