Re: [slurm-users] Users can't scancel

2020-11-18 Thread mercan
These log lines about the prolog script looks very suspicious to me: [2020-11-18T10:19:35.388] debug:  [job 110] attempting to run prolog [/cm/local/apps/cmd/scripts/prolog] then [2020-11-18T10:21:10.121] debug:  Waiting for job 110's prolog to complete [2020-11-18T10:21:10.121] debug:  Finis

Re: [slurm-users] Users can't scancel

2020-11-18 Thread William Markuske
The epilog script does have exit 0 set at the end. Epilogs exit cleanly when run. With log set to debug5 I get the following results for any scancel call. Submit host slurmctld.log [2020-11-18T10:19:34.944] _slurm_rpc_submit_batch_job: JobId=110 InitPrio=110503 usec=191 [2020-11-18T10:19:35.

Re: [slurm-users] Users can't scancel

2020-11-18 Thread mercan
Hi; Check epilog return value which comes from the return value of the last line of epilog script. Also, you can add a "exit 0" line at the last line of the epilog script to ensure to get a zero return value for testing purpose. Ahmet M. 18.11.2020 20:00 tarihinde William Markuske yazdı:

[slurm-users] Users can't scancel

2020-11-18 Thread William Markuske
Hello, I am having an odd problem where users are unable to kill their jobs with scancel. Users can submit jobs just fine and when the task completes it is able to close correctly. However, if a user attempts to cancel a job via scancel the SIGKILL signals are sent to the step but don't compl