Greetings -- My experience this morning includes the following (where there is about 10 min:
> [mrd20@hpctest ~]$ sacctmgr show association where account=<userid> > sacctmgr: error: slurmdbd: Getting response to message type 1410 > sacctmgr: error: slurmdbd: DBD_GET_ASSOCS failure: No error > Error with request: No error > This seems to be a 'time-out' error, but I have no insight into why the database would be unable to respond over the course of 10 minutes or so. Otherwise, sacctmgr will return for "sacctmgr show account..." or other type of queries. Only show associations seems affected. Now, to the suspected cause: Just prior to this, I had submitted an incorrect formulation of the 'sacctmgr delete user <usrid>', as follows, where the specific userid is omitted: > [root@hpc2 mrd20]# sacctmgr delete user where account=txl80 > Deleting user associations... > C = hpctest A = <accid> U = <usrid1> > C = hpctest A = <accid> U = <usrid2> > ... > ... > ... > C = hpctest A = <accid> U = <usrid16> > C = hpctest A = <accid> U = <usrid17> > Deleting users (No Associations)... > <usrid1> > <usrid2> > User <usrid3> on cluster hpctest no longer has a default account. > ... > ... > ... > <usrid10> > <usrid11> > this action was terminated by 'Ctrl-C' Two noteworthy items -- I had meant to operate on just one user association; and, -- The command did not issue the expected prompt to verify that I wanted to perform these deletions. Since this occurred, the time-out with 'sacctmgr show assoc....' has resolved, and it would seem that the associations were not impacted. The most important questions to me remain: could the interrupted 'delete user' have "hung" sacctmgr access to the slurmdb? And if so, what is going on behind the scenes? I'd like to come away with better insight into how the slurmdb operates. Pointers to slurmdb tutorials or slide decks are welcome ;) Best wishes, and thanks in advance ~ Em -- E.M. Dragowsky, Ph.D. Research Computing -- UTech Case Western Reserve University (216) 368-0082