On 7/10/25 10:53 pm, Ratnasamy, Fritz via slurm-users wrote:
Inside the prolog.d folder, there are 2 scripts which run with no errors
as far as I can see but is there a way to debug why the nodes are going
in draining mode once in a while because of "prolog error"? That seems
to happen at random times and on random nodes.
You could try and add some logging to the start of your prolog to
capture execution and errors. Something like this:
~/tmp/test$ cat prolog.sh
#!/bin/bash
exec 1>>"/tmp/prolog.log.${SLURM_JOB_ID}.${$}"
exec 2>&1
set -x
echo hello
fooo
~/tmp/test$ SLURM_JOB_ID=1234 ./prolog.sh
~/tmp/test$ echo $?
127
~/tmp/test$ cat /tmp/prolog.log.1234.10512
+ echo hello
hello
+ fooo
./prolog.sh: line 9: fooo: command not found
~/tmp/test$
Best of luck!
Chris
--
slurm-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]