Re: [slurm-users] slurmdbd upgrade startup error

2018-08-14 Thread Tina Fora
Hi Ole, I'm testing the upgrade on a test cluster. Two of them actually, one with exact same OS using the same mysql server and the other on updated OS with local mysql installation. I also ran the mysql_upgrade command you mentioned on the local installation. My guess is that there is something

Re: [slurm-users] slurmdbd upgrade startup error

2018-08-14 Thread Ole Holm Nielsen
Hi Tina, Is it the same OS version for 17.02 and 17.11, or are you upgrading the OS (and possibly the MySQL/MariaDB) at the same time? I assume you're testing the Slurm upgrade on a test server and not the production cluster? Did you check the steps mentioned in the thread "slurmdbd: mysql/

[slurm-users] changing JobAcctGatherType on busy cluster?

2018-08-14 Thread Alex Chekholko
Hi, Right now I have a cluster running SLURM v17.02.7 with: JobAcctGatherType = jobacct_gather/none The documentation says "NOTE: Changing this configuration parameter changes the contents of the messages between Slurm daemons. Any previously running job steps are managed by a slurmstepd d

[slurm-users] slurmdbd upgrade startup error

2018-08-14 Thread Tina Fora
Hello All, I compiled slurm from standard rpmbuild. Upgrading from 17.02 to 17.11.9-2 is giving the error below. I'm not sure what the issue is with accounting storage plugin because it seems to load it ok. On the mysql failed query I tried to run it manually and it returns sql syntax error (full

[slurm-users] How do you orchestrate SLURM operations, what tools do you use?

2018-08-14 Thread Pablo Llopis
Dear SLURM users, I was wondering what kind of tools the community is using for orchestrating SLURM operations. For instance, say you want to execute an operation in the cluster which requires draining the nodes first. What kind of tools are you using to automate the state machine that would go t

[slurm-users] Cores shared between jobs even with OverSubscribe=NO with 17.02.6

2018-08-14 Thread Lech Nieroda
Dear Slurm Users, we've observed a strange issue with oversubscription, namely cores being shared by multiple jobs. We are using the CR_CPU_Memory resource selection plugin, which unlike CR_Memory doesn't enforce oversubscription, a short partition check confirms this: $ scontrol show p