Hello, We have, rather belatedly, just upgraded to Slurm v19.05.5. On the whole, so far so good -- no major problems. One user has complained that his job now crashes and reports an unlink error. That is..
slurmstepd: error: get_exit_code task 0 died by signal: 9 slurmstepd: error: unlink(/tmp/slurmd/job392987/slurm_script): No such file or directory I suspect that this message has something to do with the completion of one of the steps in his job. Apparently his job is quite complex with a number of inter-related tasks. Significantly, we decided to switch from an rpm to a 'build from source' installation. In other words, we did have rpms on each node in the cluster, but now have slurm installed on a global file system. Does anyone have any thoughts regarding the above issue, please? I'm still to see the user's script and so there might be a good logical explanation for the message on inspection. Best regards, David