Hello,

We have, rather belatedly, just upgraded to Slurm v19.05.5. On the whole, so 
far so good -- no major problems. One user has complained that his job now 
crashes and reports an unlink error. That is..


slurmstepd: error: get_exit_code task 0 died by signal: 9
slurmstepd: error: unlink(/tmp/slurmd/job392987/slurm_script): No such file or 
directory

I suspect that this message has something to do with the completion of one of 
the steps in his job. Apparently his job is quite complex with a number of 
inter-related tasks.

Significantly, we decided to switch from an rpm to a 'build from source' 
installation. In other words, we did have rpms on each node in the cluster, but 
now have slurm installed on a global file system. Does anyone have any thoughts 
regarding the above issue, please? I'm still to see the user's script and so 
there might be a good logical explanation for the message on inspection.

Best regards,
David

Reply via email to