Thanks for your reply Bjorn-Helge

This cleared things up for me. I had not understood that we need to use Prolog 
and Epilog for the TMPDIR stuff because that guarantees it is created at the 
very beginning of the job and deleted at the very end. Everything now works as 
expected, thanks so much for your help.

-Harry

On 2/11/22, 1:19 AM, "slurm-users" <slurm-users-boun...@lists.schedmd.com> 
wrote:
"Putnam, Harry" <harry.put...@ucsf.edu<mailto:harry.put...@ucsf.edu>> writes:

> /opt/slurm/task_epilog
>
> #!/bin/bash
> mytmpdir=/scratch/$SLURM_JOB_USER/$SLURM_JOB_ID
> rm -Rf $mytmpdir
> exit;

This might not be the reason for what you observe, but I believe
deleting the scratch dir in the task epilog is not a good idea.  The
task epilog is run after every "srun" or "mpirun" inside a job, which
means that the scratch dir will be created and deleted for each job
step.  On our systems, we create the scratch dir in the (slurmd) Prolog,
set the environment variable in the TaskProlog, and delete the dir in
the (slurmd) Epilog.  That way the dir is just created and deleted once.

> I am not sure I understand what constitutes a job step.

In practice, every run of srun or mpirun creates a job step, and the job
script itself counts as a job step.

--
B/H

Reply via email to