Hi, We're running Slurm 17.11.12. Everything has been working fine, and then suddenly slurmctld is crashing and slurmdbd is crashing.
We use fair-share as part of the queuing policy, and previously set up accounts with sacctmgr; that has been working fine for months. If I run slurmdbd in debug mode, slurmdbd -D -v -v -v -v -v it eventually (after being contacted by slurmctld) segfaults with: ... slurmdbd: debug2: DBD_NODE_STATE: NODE:cn049 STATE:UP REASON:(null) TIME:1588695584 slurmdbd: debug4: got 0 commits slurmdbd: debug2: DBD_NODE_STATE: NODE:cn050 STATE:UP REASON:(null) TIME:1588695584 slurmdbd: debug4: got 0 commits slurmdbd: debug4: got 0 commits slurmdbd: debug2: DBD_GET_TRES: called slurmdbd: debug4: got 0 commits slurmdbd: debug2: DBD_GET_QOS: called slurmdbd: debug4: got 0 commits slurmdbd: debug2: DBD_GET_USERS: called slurmdbd: debug4: got 0 commits slurmdbd: debug2: DBD_GET_ASSOCS: called slurmdbd: debug4: 10(as_mysql_assoc.c:2033) query call get_parent_limits('assoc_table', 'root', 'slurm_cluster', 0); select @par_id, @mj, @msj, @mwpj, @mtpj, @mtpn, @mtmpj, @mtrm, @def_qos_id, @qos, @delta_qos; Segmentation fault (core dumped) It looks (running slurmdbd in gdb) like that segfault is coming from https://github.com/SchedMD/slurm/blob/slurm-17-11-12-1/src/plugins/accounting_storage/mysql/as_mysql_assoc.c#L2073 and If I connect to the mysql database directly and call that stored procedure, I get mysql> call get_parent_limits('assoc_table', 'root', 'slurm_cluster', 0); +---------------------+-----------------+-------------------------+----------------------+---------------------------+-------------+-----------------------------------------------------------------+-------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+-----------------------------+ | @par_id := id_assoc | @mj := max_jobs | @msj := max_submit_jobs | @mwpj := max_wall_pj | @def_qos_id := def_qos_id | @qos := qos | @delta_qos := REPLACE(CONCAT(delta_qos, @delta_qos), ',,', ',') | @mtpj := CONCAT(@mtpj, if (@mtpj != '' && max_tres_pj != '', ',', ''), max_tres_pj) | @mtpn := CONCAT(@mtpn, if (@mtpn != '' && max_tres_pn != '', ',', ''), max_tres_pn) | @mtmpj := CONCAT(@mtmpj, if (@mtmpj != '' && max_tres_mins_pj != '', ',', ''), max_tres_mins_pj) | @mtrm := CONCAT(@mtrm, if (@mtrm != '' && max_tres_run_mins != '', ',', ''), max_tres_run_mins) | @my_acct_new := parent_acct | +---------------------+-----------------+-------------------------+----------------------+---------------------------+-------------+-----------------------------------------------------------------+-------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+-----------------------------+ | 1 | NULL | NULL | NULL | NULL | ,1, | NULL | NULL | NULL | NULL | NULL | | +---------------------+-----------------+-------------------------+----------------------+---------------------------+-------------+-----------------------------------------------------------------+-------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+-----------------------------+ and if I run mysql> call get_parent_limits('assoc_table', 'root', 'slurm_cluster', 0); select @par_id, @mj, @msj, @mwpj, @mtpj, @mtpn, @mtmpj, @mtrm, @def_qos_id, @qos, @delta_qos; I get +---------+------+------+-------+-------+-------+--------+-------+-------------+------+------------+ | @par_id | @mj | @msj | @mwpj | @mtpj | @mtpn | @mtmpj | @mtrm | @def_qos_id | @qos | @delta_qos | +---------+------+------+-------+-------+-------+--------+-------+-------------+------+------------+ | 1 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | ,1, | NULL | +---------+------+------+-------+-------+-------+--------+-------+-------------+------+------------+ but I don't know what to do about this. We use another product ("Bright Cluster Manager") to manage some aspects of the cluster and Slurm installation, so we are hesitant to just upgrade Slurm. I would appreciate any tips. Thanks, --dustin