Re: [slurm-users] bug 2119 with slurm 18.08.2

Magnus Jonsson Sun, 11 Nov 2018 23:54:01 -0800

We got the same problem on our clusters. It was due to our backup script
of mysql was locking the tables (and taking to long time).

If looking at ''mod_time'' and ''control_host'' of ''cluster_table'' inthe database:


select mod_time,control_host from cluster_table;

We found that ''mod_time'' was matching the backup time exactly and the''control_host'' column was empty.


Hope this will help you go forward with your problem.

best regards,
Magnus

On 2018-11-08 19:44, Brian Andrus wrote:

All,
I am seeing what looks like the same issue ashttps://bugs.schedmd.com/show_bug.cgi?id=2119
Where, slurmctld is not picking up new accounts unless it is restarted.

I have 4 clusters (non-federated), all using the same slurmdbd
When I added an association for user name=me cluster=DevOpsaccount=Project1 and then tried to start a job, I kept getting an error:*srun: error: Unable to allocate resources: Invalid account oraccount/partition combination specified*
Then I restarted slurmctld on DevOps master and my job ran fine.

Is there some slurmdbd caching going on by slurmctld?
This is an issue in a production environment. We don't want to have torestart all the slurmctld daemons anytime there is a change to anyassociations. That could get painful
Brian Andrus


--
Magnus Jonsson, Developer, HPC2N, Umeå Universitet

Re: [slurm-users] bug 2119 with slurm 18.08.2

Reply via email to