I'm helping with a workflow manager that needs to submit Slurm jobs. For
logging and management reasons, the job (e.g. srun python) needs to be run
as though it were a regular subprocess (python):

   - stdin, stdout and stderr for the command should be connected to
   process inside the job
   - signals sent to the command should be sent to the job process
   - We don't want to use the existing job allocation, if this is run from
   a Slurm job
   - The command should only terminate when the job is finished, to avoid
   us needing to poll Slurm

We've tried:

   - sbatch --wait, but then SIGTERM'ing the process doesn't kill the job
   - salloc, but that requires a TTY process to control it (?)
   - salloc srun seems to mess with the terminal when it's killed, likely
   because of being "designed to be executed in the foreground"
   - Plain srun re-uses the existing Slurm allocation, and specifying
   resources like --mem will just request then from the current job rather
   than submitting a new one

What is the best solution here?
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to