Hi, The other day we updated to 22.05.8. We are interested in using sharding with our GPUs, so after the update had finished, we changed
SelectType=select/cons_res to SelectType=select/cons_tres This seemed to cause the slurmctld to loose contact with the slurmstepds, so that a large number of jobs were requeued, although they were in fact still running. The slurmstepds reported slurmd: error: Malformed RPC of type REQUEST_TERMINATE_JOB(6011) received slurmd: error: select_g_select_jobinfo_unpack: select plugin cons_tres not found slurmd: error: select_g_select_jobinfo_unpack: unpack error In the slurmctld log multiple lines of select/cons_res: job_res_rm_job: plugin still initializing occurred. This line also occurs in the following bug report https://bugs.schedmd.com/show_bug.cgi?id=10980 which is however related to something else, but of the line the SchedMD employee writes I don't think this should ever happen. Has anyone else seen this issue? Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin