Thanks for your reply Bjorn-Helge This cleared things up for me. I had not understood that we need to use Prolog and Epilog for the TMPDIR stuff because that guarantees it is created at the very beginning of the job and deleted at the very end. Everything now works as expected, thanks so much for your help.
-Harry On 2/11/22, 1:19 AM, "slurm-users" <slurm-users-boun...@lists.schedmd.com> wrote: "Putnam, Harry" <harry.put...@ucsf.edu<mailto:harry.put...@ucsf.edu>> writes: > /opt/slurm/task_epilog > > #!/bin/bash > mytmpdir=/scratch/$SLURM_JOB_USER/$SLURM_JOB_ID > rm -Rf $mytmpdir > exit; This might not be the reason for what you observe, but I believe deleting the scratch dir in the task epilog is not a good idea. The task epilog is run after every "srun" or "mpirun" inside a job, which means that the scratch dir will be created and deleted for each job step. On our systems, we create the scratch dir in the (slurmd) Prolog, set the environment variable in the TaskProlog, and delete the dir in the (slurmd) Epilog. That way the dir is just created and deleted once. > I am not sure I understand what constitutes a job step. In practice, every run of srun or mpirun creates a job step, and the job script itself counts as a job step. -- B/H