On 4/5/19 4:28 PM, Julien Rey wrote:
The failure occurs after a few minutes (~10).
And we are running out of space on the slurm controller. The mysql
daemon is at 100% CPU usage all the time. This issue is becoming critical.
...
Our slurm accounting database is growing bigger and bigger (more
Hi Julien,
Did you optimize the MySQL database, in particular InnoDB?
I have collected some documentation in my Wiki page
https://wiki.fysik.dtu.dk/niflheim/Slurm_database#mysql-configuration
and I also discuss database purging.
Please note that we run Slurm 17.11 (and recently 18.08) on Cent
The failure occurs after a few minutes (~10).
And we are running out of space on the slurm controller. The mysql
daemon is at 100% CPU usage all the time. This issue is becoming critical.
Le 05/04/2019 16:10, Paul Edmon a écrit :
Did it just time out, or did that failure happen immediately. I
Did it just time out, or did that failure happen immediately. If
immediate you may be in a situation where you are hitting a bug. It
"should" be safe to upgrade to a later version of 15.08.*. There may be
fixes in there related to that. I would look at the changelog though
just to see if ther
Same problem here: a Job submitted with gres-flags=disable-bindings is
assigned a node, but then the job step fails because all GPUs on that
node are already in use. Log messages:
[2019-04-05T15:29:05.216] error: gres/gpu: job 92453 node node5
overallocated resources by 1, (9 > 8)
[2019-04-05
Hi Paul, thanks for your advice. Actually I already tried what you
suggested. No matter what value do I put after PurgeJobAfter I always
end up with the same error:
sacctmgr archive dump Directory=/home/joule/archives/ PurgeJobAfter=1days
sacctmgr: error: slurmdbd: Getting response to message t
Hi Lech,
Thanks! I added the 18.08 Release Notes reference to
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#database-upgrade-from-slurm-17-02-and-older
I've already upgraded from 17.11 to 18.08 without your patch, and this
went smoothly as expected. We upgraded from 17.02 to 17.11 l
Hi Ole,
your summary is correct as far as I can tell and will hopefully help some users.
One thing I’d add is the remark from the 18.08 Release Notes (
https://github.com/SchedMD/slurm/blob/slurm-18.08/RELEASE_NOTES ), which adds
mysql 5.5 to the list.
They’ve mentioned that mysql 5.5 is the def
Hi Lech,
I've tried to summarize your work on the Slurm database upgrade patch in
my Slurm Wiki page:
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#database-upgrade-from-slurm-17-02-and-older
Could you kindly check if my notes are correct and complete? Hopefully
this Wiki will also h