Hi Fabio, My guess is that you can (partly) solve this by using the correct state in slurm.conf. Either CLOUD or FUTURE might be what you're looking for. See `man slum.conf`.
Kind regards, Martijn Kruiten On Fri, 2019-05-17 at 09:17 +0000, Verzelloni Fabio wrote: > Hello, > I have a question related to the cloud feature or a feature that can > solve an issue that I have with my cluster,to make it simple let say > that I have a set of nodes ( let say 10 nodes ), if needed I move > node/s from cluster A to cluster B and in my slurm.conf I define all > the possible number of available nodes: > > Cluster A > NodeName=clusterA-[001-010] > > Cluster B > NodeName=clusterB-[001-010] > > In normal operation I have 5 nodes in 'cluster A' and 5 in 'cluster > B', but in case of needs I reboot a node of 'cluster B' in 'cluster > A', and the result will be 4 nodes in 'cluster B' and 6 in 'cluster > A'. > The "issue" is that since I specified all possible nodes in > slurm.conf, when I ran sinfo what I see is: > > Cluster A > Normal up 1-00:00:00 5 up clusterA-[01-05] > Normal up 1-00:00:00 5 down* clusterA-[06-10] > > Cluster B > Normal up 1-00:00:00 5 up clusterB-[06-10] > Normal up 1-00:00:00 5 down* clusterB-[01-5] > > And in both slurmctld.log I have the message: > > error: Unable to resolve "clusterA-006": Unknown host > > or > > error: Unable to resolve "clusterB-001": Unknown host > > Since I have a lot of partitions and a lot of nodes, the sinfo it is > much more complicated to read due to the DOWN nodes that are actually > not present in the system, is there a way/feature/option that wont > display in the sinfo nodes that are actually NOT present and > reachable by the slurmctld due to the "error: Unable to resolve > "clusterA-006": Unknown host " ? > > Basically I'd like to have in both slurm.conf all the possible nodes > but the sinfo should shows: > > Cluster A > Normal up 1-00:00:00 5 up clusterA-[01-05] > > Cluster B > Normal up 1-00:00:00 5 up clusterB-[06-10] > > And If I move a node once the node is actually reachable: > > Cluster A > Normal up 1-00:00:00 6 up clusterA-[01-06] > > Cluster B > Normal up 1-00:00:00 4 up clusterB-[07-10] > > Thanks > Fabio > > -- > - Fabio Verzelloni - CSCS - Swiss National Supercomputing Centre > via Trevano 131 - 6900 Lugano, Switzerland > Tel: +41 (0)91 610 82 04 > > -- | System Programmer | SURFsara | Science Park 140 | 1098 XG Amsterdam | | T +31 6 20043417 | martijn.krui...@surfsara.nl | www.surfsara.nl |
smime.p7s
Description: S/MIME cryptographic signature