Hello, We are experiencing quite a number of database failures. We saw an outright failure a short while ago where we had to restart the maria database and the slurmdbd process. After restarting the database appear to be working well, however over the last few days I have notice quite a number of failures. For example -- see below. Does anyone understand what might be going wrong, why and whether we should be concerned, please? I understand that slurm databases can get quite large relatively quickly and so I wonder if this is memory related.
Best regards, David [root@blue51 slurm]# less slurmdbd.log-20190506.gz | grep failed [2019-05-05T04:00:05.603] error: mysql_query failed: 1213 Deadlock found when trying to get lock; try restarting transaction [2019-05-05T04:00:05.606] error: Cluster i5 rollup failed [2019-05-05T23:00:07.017] error: mysql_query failed: 1213 Deadlock found when trying to get lock; try restarting transaction [2019-05-05T23:00:07.018] error: Cluster i5 rollup failed [2019-05-06T00:00:13.348] error: mysql_query failed: 1213 Deadlock found when trying to get lock; try restarting transaction [2019-05-06T00:00:13.350] error: Cluster i5 rollup failed