[slurm-users] how to safely rename a slurm user's name
Hello, I'd like to ask if there is any safe method to rename an existing slurm user to new username with the same uid? As for linux itself, it's quite common to have 2 user share the same uid. So if we already have 2 system users, for example Bob(uid=1500) and HPCBob(uid=1500), and the HPCBob is the existing slurm user, how can we rename it to Bob without lost of user's history job records or associations? Do we have to manually edit the slurm database? -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: 转发: What is the safe upgrade path when upgrade from slurm21.08 and mariadb5.5?
Thanks for all your help. So it seems we can skip the trouble of compiling SLURM over different mariadb versions. Tianyang Zhang SJTU Network Information Center 发件人: Sid Young 发送时间: 2024年10月30日 7:19 收件人: Andrej Sec 抄送: taleinterve...@sjtu.edu.cn; slurm-users@lists.schedmd.com 主题: Re: [slurm-users] Re: 转发: What is the safe upgrade path when upgrade from slurm21.08 and mariadb5.5? I recently upgraded from 20.11 to 24.05.2, before moving the cluster from CentOS 7.9 to Oracle Linux 8.10 The DB upgrade should be pretty simple, do a mysqldump first, then uninstall the old DB, change the repo's and install the new DB version. It should recognise the DB files on disk and access them. Do another DB backup on the new DB version. then roll through the Slurm upgrades. I picked the first and last version of each release, and systematically went through each node till it was done. First the slurm controller node, then the compute nodes. To avoid Job loss, drain the nodes or you end up with a situation where the slurmd can't talk to the running slurmstepd and the job(s) gets lost. (Shows as a "Protocol Error"). Ole sent me a link to this guide which mostly worked. https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/#upgrade-slurmd-on-nodes Sid Young W: https://off-grid-engineering.com On Tue, Oct 29, 2024 at 6:33 PM Andrej Sec via slurm-users mailto:slurm-users@lists.schedmd.com> > wrote: Hi, we are facing a similar task. We have a Slurm 22.05 / MariaDB 5.5.68 environment and want to upgrade to a newer version. According to the documentation, it’s recommended to upgrade from 22.05 to a maximum of 23.11 in one step. With the MariaDB upgrade, there’s a challenge between 10.1 and 10.2+ due to incompatible changes (https://mariadb.com/kb/en/changes-improvements-in-mariadb-10-2). This upgrade, as I understand from the documentation, requires at least slurm 22.05, where it is automatically handled by the slurmdbd service. In the test lab, we performed the following tests: a. Incremental upgrade - according to MariaDB recommendations: 1. Upgrade MariaDB 5.5.68 -> 10.1.48 -> 10.2.44 2. Start the Slurm suite 22.05, checking content after each MariaDB upgrade step. During the 10.1 -> 10.2 upgrade, the slurmdbd service automatically converted the database to the required format. We had enabled general.log in MariaDB, allowing detailed inspection of database changes during conversion. 3. Upgrade slurmdbd to version 23.11 4. Upgrade slurmctld to version 23.11 5. Upgrade slurmd to version 23.11 6. Check the database content and compare tests before and after the upgrade (we used various reports with scontrol, sreport, sacct, sacctmgr for verification). b. Direct MariaDB upgrade from 5.5.68 to 10.2.44 using the same approach. According to the tests, this resulted in the same state as the incremental approach. PS: If you proceed with the upgrade, I would appreciate it if you could let us know about any potential challenges you encountered. Andrej Sec nscc, Bratislava, Slovakia _____ Od: "hermes via slurm-users" mailto:slurm-users@lists.schedmd.com> > Komu: slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> Odoslané: pondelok, 28. október 2024 8:48:19 Predmet: [slurm-users] =?eucgb2312_cn?q?=D7=AA=B7=A2=3A_What_is_the_safe_upgrade_path_when_upgrade_from_slurm21=2E08_and_mariadb5=2E5=3F?= Hi everyone: We are currently running business on SLURM21.08 and mariadb5.5. When talking about the upgrade, we need to keep all the users and jobs history data. And we see the official document wrote: “When upgrading an existing accounting database to MariaDB 10.2.1 or later from an older version of MariaDB or any version of MySQL, ensure you are running slurmdbd 22.05.7 or later. These versions will gracefully handle changes to MariaDB default values that can cause problems for slurmdbd.” So is this mean we have to firstly build SLURM>22.05 over mariadb5.5, and do the SLURM upgrade. Then upgrade the mariadb to newer version, and rebuild the same version of SLURM over new mariadb-devel? And is it safe to jump directly from mariadb5.5 to latest version? How can we check whether the slurm have correctly inherited the historical data? Thanks, Tianyang Zhang SJTU Network Information Center -- slurm-users mailing list -- slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com <mailto:slurm-users-le...@lists.schedmd.com> -- slurm-users mailing list -- slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com <mailto:slurm-users-le...@lists.schedmd.com> -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] 转发: What is the safe upgrade path when upgrade from slurm21.08 and mariadb5.5?
Hi everyone: We are currently running business on SLURM21.08 and mariadb5.5. When talking about the upgrade, we need to keep all the users and jobs history data. And we see the official document wrote: “When upgrading an existing accounting database to MariaDB 10.2.1 or later from an older version of MariaDB or any version of MySQL, ensure you are running slurmdbd 22.05.7 or later. These versions will gracefully handle changes to MariaDB default values that can cause problems for slurmdbd.” So is this mean we have to firstly build SLURM>22.05 over mariadb5.5, and do the SLURM upgrade. Then upgrade the mariadb to newer version, and rebuild the same version of SLURM over new mariadb-devel? And is it safe to jump directly from mariadb5.5 to latest version? How can we check whether the slurm have correctly inherited the historical data? Thanks, Tianyang Zhang SJTU Network Information Center -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] how to set slurmdbd.conf if using two slurmdb node with HA database?
The deployment scenario is as follows: nodeA nodeB (slurmctld)(backup slurmctld) | \---/ | | / \ | nodeC nodeD (slurmdbd) (backup slurmdbd) (mysql) <--multi master replica--> (mysql) Since the database is multi-master replicated, the slurmdbd should only talk to the mysql on its own node. In such case, how should we set the slurmdbd.conf? The conf file contains options "DbdAddr", "DbdHost" and "DbdBackupHost". Should they be consistent between nodeA-2 and nodeB-2? Such as: DbdAddr = nodeC | DbdAddr = nodeC DbdHost = nodeC | DbdHost = nodeC DbdBackupHost = nodeD| DbdBackupHost = nodeD StorageHost = nodeC | StorageHost = nodeD Or maybe just set different conf and don't use the "DbdBackupHost" like: DbdAddr = nodeC | DbdAddr = nodeD DbdHost = nodeC | DbdHost = nodeD StorageHost = nodeC | StorageHost = nodeD I'm quite confused about the usage of DbdAddr and DbdHost. What is the difference between them and why only DbdHost has the backup one? Another confusing point is how DbdBackupHost work. I guess It is slurmctld that is responsible for selecting the available slurmdbd. Since the slurm.conf already contains "AccountingStorageHost" and "AccountingStorageBackupHost", why we need set backupdbd again on slurmdbd side? -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: how to set slurmdbd.conf if using two slurmdb node with HA database?
Do you mean the second configuration scheme? I think configuring `dbdhost=localhost` is the same as configuring ` DbdAddr =nodeC` and ` DbdAddr =nodeD` on the two nodes respectively. The key point is whether we should set the DbdBackupHost option and how it work? 发件人: Daniel Letai 发送时间: 2025年2月19日 18:21 收件人: slurm-users@lists.schedmd.com 主题: [slurm-users] Re: how to set slurmdbd.conf if using two slurmdb node with HA database? I'm not sure it will work, didn't test it, but could you just do `dbdhost=localhost` to solve this? On 18/02/2025 11:59, hermes via slurm-users wrote: The deployment scenario is as follows: nodeA nodeB (slurmctld)(backup slurmctld) | \---/ | | / \ | nodeC nodeD (slurmdbd) (backup slurmdbd) (mysql) <--multi master replica--> (mysql) Since the database is multi-master replicated, the slurmdbd should only talk to the mysql on its own node. In such case, how should we set the slurmdbd.conf? The conf file contains options “DbdAddr”, “DbdHost” and “DbdBackupHost”. Should they be consistent between nodeA-2 and nodeB-2? Such as: DbdAddr = nodeC | DbdAddr = nodeC DbdHost = nodeC | DbdHost = nodeC DbdBackupHost = nodeD| DbdBackupHost = nodeD StorageHost = nodeC | StorageHost = nodeD Or maybe just set different conf and don’t use the “DbdBackupHost” like: DbdAddr = nodeC | DbdAddr = nodeD DbdHost = nodeC | DbdHost = nodeD StorageHost = nodeC | StorageHost = nodeD I’m quite confused about the usage of DbdAddr and DbdHost. What is the difference between them and why only DbdHost has the backup one? Another confusing point is how DbdBackupHost work. I guess It is slurmctld that is responsible for selecting the available slurmdbd. Since the slurm.conf already contains “AccountingStorageHost” and “AccountingStorageBackupHost”, why we need set backupdbd again on slurmdbd side? -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] 回复: Re: how to set slurmdbd.conf if using two slurmdb node with HA database?
Thank you for your insightful suggestions. Placing both slurmdbd and slurmctld on the same node is indeed a new structure that we hadn’t considered before, and it seems to provide a much clearer logic for deployment. Regarding the usage of DbdBackupHost, I would like to confirm my understanding of how it works. Is it mean that the DbdBackupHost option will only be referenced when slurmdbd service detects its local database (specified by StorageHost) is unavailable? And I guess in that case, the first slurmdbd service would act as a proxy who forwards requests to the DbdBackupHost and returns the data from there to slurmctld? 发件人: Daniel Letai 发送时间: 2025年2月20日 21:56 收件人: taleinterve...@sjtu.edu.cn 抄送: slurm-users@lists.schedmd.com 主题: Re: [slurm-users] Re: how to set slurmdbd.conf if using two slurmdb node with HA database? It's functionally the same with one difference - the configuration file is unmodified between nodes, allowing for simple deployment of nodes, and automation. Regarding the backuphost - that depends on your setup. If you can ensure the slurmdbd service will stop if the local db replica is not healthy, you shouldn't need backuphost. Conversely, if there is no health check to ensure replica readiness, configure the backuphost. This will require using a different conf file for each node, unless setting up a more robust HA clustering scheme. The other option is to separate the dbd from the db. Put the dbd on the ctld nodes (A,B) and let nodes C,D only be DB master replica (not dbd). In slurm.conf on nodes A,B You will then have: AccountingStorageHost = localhost (without AccountingStorageBackupHost) And in slurmdbd.conf you will have: DbdHost = localhost (without DbdBackupHost) StorageHost = nodeC StorageBackupHost = nodeD This would mean identical slurm.conf and slurmdbd.conf on both nodes A,B, and no slurm conf files or processes on nodes C,D. This setup assumes that the entire stack (ctld+dbd) is either working or not, which is usually true, as either the node is functioning or not. If the ctld is working but dbd is not, you will loose connection to the DB. If the ctld is not working, the other ctld will take charge and use its local dbd, so that scenario is covered. Adding AccountingStorageBackupHost pointing to the other node is of course possible, but will mean different slurm.conf files which slurm will complain about. It will mean that most of the time you will not load balance on the multi-master DB replicas. Whether that is a consideration or not is for you to decide. On 20/02/2025 3:57, taleinterve...@sjtu.edu.cn <mailto:taleinterve...@sjtu.edu.cn> wrote: Do you mean the second configuration scheme? I think configuring `dbdhost=localhost` is the same as configuring ` DbdAddr =nodeC` and ` DbdAddr =nodeD` on the two nodes respectively. The key point is whether we should set the DbdBackupHost option and how it work? 发件人: Daniel Letai <mailto:d...@letai.org.il> 发送时间: 2025年2月19日 18:21 收件人: slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> 主题: [slurm-users] Re: how to set slurmdbd.conf if using two slurmdb node with HA database? I'm not sure it will work, didn't test it, but could you just do `dbdhost=localhost` to solve this? On 18/02/2025 11:59, hermes via slurm-users wrote: The deployment scenario is as follows: nodeA nodeB (slurmctld)(backup slurmctld) | \---/ | | / \ | nodeC nodeD (slurmdbd) (backup slurmdbd) (mysql) <--multi master replica--> (mysql) Since the database is multi-master replicated, the slurmdbd should only talk to the mysql on its own node. In such case, how should we set the slurmdbd.conf? The conf file contains options “DbdAddr”, “DbdHost” and “DbdBackupHost”. Should they be consistent between nodeA-2 and nodeB-2? Such as: DbdAddr = nodeC | DbdAddr = nodeC DbdHost = nodeC | DbdHost = nodeC DbdBackupHost = nodeD| DbdBackupHost = nodeD StorageHost = nodeC | StorageHost = nodeD Or maybe just set different conf and don’t use the “DbdBackupHost” like: DbdAddr = nodeC | DbdAddr = nodeD DbdHost = nodeC | DbdHost = nodeD StorageHost = nodeC | StorageHost = nodeD I’m quite confused about the usage of DbdAddr and DbdHost. What is the difference between them and why only DbdHost has the backup one? Another confusing point is how DbdBackupHost work. I guess It is slurmctld that is responsible for selecting the available slurmdbd. Since the slurm.conf already contains “AccountingStorageHost” and “AccountingStorageBackupHost”, why we need set backupdbd again on slurmdbd side? -- slurm-users mail
[slurm-users] 回复: 回复: Re: how to set slurmdbd.conf if using two slurmdb node with HA database?
But there is even 3rd pair backup option if we count the slurm.conf : AccountingStorageHost and AccountingStorageBackupHost. I think this is what slurmctld referred to when it finds primary slurmdbd unavailable. Will slurmctld also go to read the slurmdbd.conf? It seem to be the dedicated configuration file for slurmdbd. So I think the DbdBackupHost should not influence the slurmctld’s behaviour. Otherwise the usage of AccountingStorageBackupHost and DbdBackupHost will be totally duplicated. 张天阳 网络信息中心 计算业务部 发件人: Daniel Letai 发送时间: 2025年2月21日 14:04 收件人: taleinterve...@sjtu.edu.cn 抄送: slurm-users@lists.schedmd.com 主题: Re: 回复: [slurm-users] Re: how to set slurmdbd.conf if using two slurmdb node with HA database? There are 2 backuphosts configurations. DbdBackupHost is used if the slurmdbd service is unavailable (timeout). In that case the slurmctld will try to connect to the slurmdbd on another node. StorageBackupHost, on the other hand, is what you describe - timeout when connecting to a DB replica will make slurmdbd switch to using the other replica. On 21/02/2025 4:45, taleinterve...@sjtu.edu.cn <mailto:taleinterve...@sjtu.edu.cn> wrote: Thank you for your insightful suggestions. Placing both slurmdbd and slurmctld on the same node is indeed a new structure that we hadn’t considered before, and it seems to provide a much clearer logic for deployment. Regarding the usage of DbdBackupHost, I would like to confirm my understanding of how it works. Is it mean that the DbdBackupHost option will only be referenced when slurmdbd service detects its local database (specified by StorageHost) is unavailable? And I guess in that case, the first slurmdbd service would act as a proxy who forwards requests to the DbdBackupHost and returns the data from there to slurmctld? 发件人: Daniel Letai <mailto:d...@letai.org.il> 发送时间: 2025年2月20日 21:56 收件人: taleinterve...@sjtu.edu.cn <mailto:taleinterve...@sjtu.edu.cn> 抄送: slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> 主题: Re: [slurm-users] Re: how to set slurmdbd.conf if using two slurmdb node with HA database? It's functionally the same with one difference - the configuration file is unmodified between nodes, allowing for simple deployment of nodes, and automation. Regarding the backuphost - that depends on your setup. If you can ensure the slurmdbd service will stop if the local db replica is not healthy, you shouldn't need backuphost. Conversely, if there is no health check to ensure replica readiness, configure the backuphost. This will require using a different conf file for each node, unless setting up a more robust HA clustering scheme. The other option is to separate the dbd from the db. Put the dbd on the ctld nodes (A,B) and let nodes C,D only be DB master replica (not dbd). In slurm.conf on nodes A,B You will then have: AccountingStorageHost = localhost (without AccountingStorageBackupHost) And in slurmdbd.conf you will have: DbdHost = localhost (without DbdBackupHost) StorageHost = nodeC StorageBackupHost = nodeD This would mean identical slurm.conf and slurmdbd.conf on both nodes A,B, and no slurm conf files or processes on nodes C,D. This setup assumes that the entire stack (ctld+dbd) is either working or not, which is usually true, as either the node is functioning or not. If the ctld is working but dbd is not, you will loose connection to the DB. If the ctld is not working, the other ctld will take charge and use its local dbd, so that scenario is covered. Adding AccountingStorageBackupHost pointing to the other node is of course possible, but will mean different slurm.conf files which slurm will complain about. It will mean that most of the time you will not load balance on the multi-master DB replicas. Whether that is a consideration or not is for you to decide. On 20/02/2025 3:57, taleinterve...@sjtu.edu.cn <mailto:taleinterve...@sjtu.edu.cn> wrote: Do you mean the second configuration scheme? I think configuring `dbdhost=localhost` is the same as configuring ` DbdAddr =nodeC` and ` DbdAddr =nodeD` on the two nodes respectively. The key point is whether we should set the DbdBackupHost option and how it work? 发件人: Daniel Letai <mailto:d...@letai.org.il> 发送时间: 2025年2月19日 18:21 收件人: slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> 主题: [slurm-users] Re: how to set slurmdbd.conf if using two slurmdb node with HA database? I'm not sure it will work, didn't test it, but could you just do `dbdhost=localhost` to solve this? On 18/02/2025 11:59, hermes via slurm-users wrote: The deployment scenario is as follows: nodeA nodeB (slurmctld)(backup slurmctld) | \---/ | | / \ | nodeC
[slurm-users] 回复: 回复: 回复: Re: how to set slurmdbd.conf if using two slurmdb node with HA database?
nfigure the backuphost. This will require using a different conf file for each node, unless setting up a more robust HA clustering scheme. The other option is to separate the dbd from the db. Put the dbd on the ctld nodes (A,B) and let nodes C,D only be DB master replica (not dbd). In slurm.conf on nodes A,B You will then have: AccountingStorageHost = localhost (without AccountingStorageBackupHost) And in slurmdbd.conf you will have: DbdHost = localhost (without DbdBackupHost) StorageHost = nodeC StorageBackupHost = nodeD This would mean identical slurm.conf and slurmdbd.conf on both nodes A,B, and no slurm conf files or processes on nodes C,D. This setup assumes that the entire stack (ctld+dbd) is either working or not, which is usually true, as either the node is functioning or not. If the ctld is working but dbd is not, you will loose connection to the DB. If the ctld is not working, the other ctld will take charge and use its local dbd, so that scenario is covered. Adding AccountingStorageBackupHost pointing to the other node is of course possible, but will mean different slurm.conf files which slurm will complain about. It will mean that most of the time you will not load balance on the multi-master DB replicas. Whether that is a consideration or not is for you to decide. On 20/02/2025 3:57, taleinterve...@sjtu.edu.cn <mailto:taleinterve...@sjtu.edu.cn> wrote: Do you mean the second configuration scheme? I think configuring `dbdhost=localhost` is the same as configuring ` DbdAddr =nodeC` and ` DbdAddr =nodeD` on the two nodes respectively. The key point is whether we should set the DbdBackupHost option and how it work? 发件人: Daniel Letai <mailto:d...@letai.org.il> 发送时间: 2025年2月19日 18:21 收件人: slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> 主题: [slurm-users] Re: how to set slurmdbd.conf if using two slurmdb node with HA database? I'm not sure it will work, didn't test it, but could you just do `dbdhost=localhost` to solve this? On 18/02/2025 11:59, hermes via slurm-users wrote: The deployment scenario is as follows: nodeA nodeB (slurmctld)(backup slurmctld) | \---/ | | / \ | nodeC nodeD (slurmdbd) (backup slurmdbd) (mysql) <--multi master replica--> (mysql) Since the database is multi-master replicated, the slurmdbd should only talk to the mysql on its own node. In such case, how should we set the slurmdbd.conf? The conf file contains options “DbdAddr”, “DbdHost” and “DbdBackupHost”. Should they be consistent between nodeA-2 and nodeB-2? Such as: DbdAddr = nodeC | DbdAddr = nodeC DbdHost = nodeC | DbdHost = nodeC DbdBackupHost = nodeD| DbdBackupHost = nodeD StorageHost = nodeC | StorageHost = nodeD Or maybe just set different conf and don’t use the “DbdBackupHost” like: DbdAddr = nodeC | DbdAddr = nodeD DbdHost = nodeC | DbdHost = nodeD StorageHost = nodeC | StorageHost = nodeD I’m quite confused about the usage of DbdAddr and DbdHost. What is the difference between them and why only DbdHost has the backup one? Another confusing point is how DbdBackupHost work. I guess It is slurmctld that is responsible for selecting the available slurmdbd. Since the slurm.conf already contains “AccountingStorageHost” and “AccountingStorageBackupHost”, why we need set backupdbd again on slurmdbd side? -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com