Previous versions of mysql are suppose to have nasty security issues. I'm not sure why I had mysql instead of mariadb anyway.
On Mon, May 11, 2020 at 9:29 AM Relu Patrascu <r...@vectorinstitute.ai> wrote: > > We've experienced the same problem on several versions of slurmdbd > (18, 19) so we downgraded mysql and put a hold on the package. > > Hey Dustin, funny we meet here :) > Relu > > On Tue, May 5, 2020 at 3:43 PM Dustin Lang <dstnd...@gmail.com> wrote: > > > > I tried upgrading Slurm to 18.08.9 and I am still getting this Segmentation > > Fault! > > > > > > > > On Tue, May 5, 2020 at 2:39 PM Dustin Lang <dstnd...@gmail.com> wrote: > >> > >> Hi, > >> > >> Apparently my colleague upgraded the mysql client and server, but, as far > >> as I can tell, this was only 5.7.29 to 5.7.30, and checking the mysql > >> release notes I don't see anything that looks suspicious there... > >> > >> cheers, > >> --dustin > >> > >> > >> On Tue, May 5, 2020 at 1:37 PM Dustin Lang <dstnd...@gmail.com> wrote: > >>> > >>> Hi, > >>> > >>> We're running Slurm 17.11.12. Everything has been working fine, and then > >>> suddenly slurmctld is crashing and slurmdbd is crashing. > >>> > >>> We use fair-share as part of the queuing policy, and previously set up > >>> accounts with sacctmgr; that has been working fine for months. > >>> > >>> If I run slurmdbd in debug mode, > >>> > >>> slurmdbd -D -v -v -v -v -v > >>> > >>> it eventually (after being contacted by slurmctld) segfaults with: > >>> > >>> ... > >>> slurmdbd: debug2: DBD_NODE_STATE: NODE:cn049 STATE:UP REASON:(null) > >>> TIME:1588695584 > >>> slurmdbd: debug4: got 0 commits > >>> slurmdbd: debug2: DBD_NODE_STATE: NODE:cn050 STATE:UP REASON:(null) > >>> TIME:1588695584 > >>> slurmdbd: debug4: got 0 commits > >>> slurmdbd: debug4: got 0 commits > >>> slurmdbd: debug2: DBD_GET_TRES: called > >>> slurmdbd: debug4: got 0 commits > >>> slurmdbd: debug2: DBD_GET_QOS: called > >>> slurmdbd: debug4: got 0 commits > >>> slurmdbd: debug2: DBD_GET_USERS: called > >>> slurmdbd: debug4: got 0 commits > >>> slurmdbd: debug2: DBD_GET_ASSOCS: called > >>> slurmdbd: debug4: 10(as_mysql_assoc.c:2033) query > >>> call get_parent_limits('assoc_table', 'root', 'slurm_cluster', 0); select > >>> @par_id, @mj, @msj, @mwpj, @mtpj, @mtpn, @mtmpj, @mtrm, @def_qos_id, > >>> @qos, @delta_qos; > >>> Segmentation fault (core dumped) > >>> > >>> > >>> It looks (running slurmdbd in gdb) like that segfault is coming from > >>> > >>> https://github.com/SchedMD/slurm/blob/slurm-17-11-12-1/src/plugins/accounting_storage/mysql/as_mysql_assoc.c#L2073 > >>> > >>> and If I connect to the mysql database directly and call that stored > >>> procedure, I get > >>> > >>> mysql> call get_parent_limits('assoc_table', 'root', 'slurm_cluster', 0); > >>> +---------------------+-----------------+-------------------------+----------------------+---------------------------+-------------+-----------------------------------------------------------------+-------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+-----------------------------+ > >>> | @par_id := id_assoc | @mj := max_jobs | @msj := max_submit_jobs | @mwpj > >>> := max_wall_pj | @def_qos_id := def_qos_id | @qos := qos | @delta_qos := > >>> REPLACE(CONCAT(delta_qos, @delta_qos), ',,', ',') | @mtpj := > >>> CONCAT(@mtpj, if (@mtpj != '' && max_tres_pj != '', ',', ''), > >>> max_tres_pj) | @mtpn := CONCAT(@mtpn, if (@mtpn != '' && max_tres_pn != > >>> '', ',', ''), max_tres_pn) | @mtmpj := CONCAT(@mtmpj, if (@mtmpj != '' && > >>> max_tres_mins_pj != '', ',', ''), max_tres_mins_pj) | @mtrm := > >>> CONCAT(@mtrm, if (@mtrm != '' && max_tres_run_mins != '', ',', ''), > >>> max_tres_run_mins) | @my_acct_new := parent_acct | > >>> +---------------------+-----------------+-------------------------+----------------------+---------------------------+-------------+-----------------------------------------------------------------+-------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+-----------------------------+ > >>> | 1 | NULL | NULL | > >>> NULL | NULL | ,1, | NULL > >>> | NULL > >>> | NULL > >>> > >>> | NULL > >>> | NULL > >>> | > >>> | > >>> +---------------------+-----------------+-------------------------+----------------------+---------------------------+-------------+-----------------------------------------------------------------+-------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+-----------------------------+ > >>> > >>> and if I run > >>> > >>> mysql> call get_parent_limits('assoc_table', 'root', 'slurm_cluster', 0); > >>> select @par_id, @mj, @msj, @mwpj, @mtpj, @mtpn, @mtmpj, @mtrm, > >>> @def_qos_id, @qos, @delta_qos; > >>> > >>> I get > >>> > >>> +---------+------+------+-------+-------+-------+--------+-------+-------------+------+------------+ > >>> | @par_id | @mj | @msj | @mwpj | @mtpj | @mtpn | @mtmpj | @mtrm | > >>> @def_qos_id | @qos | @delta_qos | > >>> +---------+------+------+-------+-------+-------+--------+-------+-------------+------+------------+ > >>> | 1 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | > >>> NULL | ,1, | NULL | > >>> +---------+------+------+-------+-------+-------+--------+-------+-------------+------+------------+ > >>> > >>> but I don't know what to do about this. > >>> > >>> We use another product ("Bright Cluster Manager") to manage some aspects > >>> of the cluster and Slurm installation, so we are hesitant to just upgrade > >>> Slurm. > >>> > >>> I would appreciate any tips. > >>> > >>> Thanks, > >>> --dustin > >>> > >>> > > > -- > > +1-647-680-7564 >