In general I would follow this:
https://slurm.schedmd.com/quickstart_admin.html#upgrade
Namely:
Almost every new major release of Slurm (e.g. 19.05.x to 20.02.x)
involves changes to the state files with new data structures, new
options, etc. Slurm permits upgrades to a new major release from the
past two major releases, which happen every nine months (e.g. 18.08.x or
19.05.x to 20.02.x) without loss of jobs or other state information.
State information from older versions will not be recognized and will be
discarded, resulting in loss of all running and pending jobs. State
files are *not* recognized when downgrading (e.g. from 19.05.x to
18.08.x) and will be discarded, resulting in loss of all running and
pending jobs. For this reason, creating backup copies of state files (as
described below) can be of value. Therefore when upgrading Slurm (more
precisely, the slurmctld daemon), saving the /StateSaveLocation/ (as
defined in /slurm.conf/) directory contents with all state information
is recommended. If you need to downgrade, restoring that directory's
contents will let you recover the jobs. Jobs submitted under the new
version will not be in those state files, but it can let you recover
most jobs. An exception to this is that jobs may be lost when installing
new pre-release versions (e.g. 20.02.0-pre1 to 20.02.0-pre2). Developers
will try to note these cases in the NEWS file. Contents of major
releases are also described in the RELEASE_NOTES file.
So I wouldn't go directly to 20.x, instead I would go from 17.x to 19.x
and then to 20.x
-Paul Edmon-
On 11/2/2020 8:55 AM, Fulcomer, Samuel wrote:
We're doing something similar. We're continuing to run production on
17.x and have set up a new server/cluster running 20.x for testing and
MPI app rebuilds.
Our plan had been to add recently purchased nodes to the new cluster,
and at some point turn off submission on the old cluster and switch
everyone to submission on the new cluster (new login/submission
hosts). That way previously submitted MPI apps would continue to run
properly. As the old cluster partitions started to clear out we'd mark
ranges of nodes to drain and move them to the new cluster.
We've since decided to wait until January, when we've scheduled some
downtime. The process will remain the same wrt moving nodes from the
old cluster to the new, _except_ that everything will be drained, so
we can move big blocks of nodes and avoid slurm.conf Partition line
ugliness.
We're starting with a fresh database to get rid of the bug
induced corruption that prevents GPUs from being fenced with cgroups.
regards,
s
On Mon, Nov 2, 2020 at 8:28 AM navin srivastava
<navin.alt...@gmail.com <mailto:navin.alt...@gmail.com>> wrote:
Dear All,
Currently we are running slurm version 17.11.x and wanted to move
to 20.x.
We are building the New server with Slurm 20.2 version and
planning to upgrade the client nodes from 17.x to 20.x.
wanted to check if we can upgrade the Client from 17.x to 20.x
directly or we need to go through 17.x to 18.x and 19.x then 20.x
Regards
Navin.