Unfortunately this is not a good workflow. You would submit a staging job with a dependency for the compute job; however, in the meantime, the scheduler might launch higher-priority jobs that would want the scratch space, and cause it to be scrubbed.
In a rational process, the scratch space would be scrubbed for the higher-priority jobs. I'm now thinking of a way that the scheduler could consider data turds left by previous jobs, but that's not currently a scheduling feature in SLURM multi-factor or any other scheduler I know. The best current workflow is to stage data into fast local persistent storage, and then to schedule jobs, or schedule a job that does it synchronously (TImeLimit=Stage+Compute). The latter is pretty unsocial and wastes cycles. On Sat, Apr 3, 2021 at 3:45 PM Will Dennis <wden...@nec-labs.com> wrote: > Hi all, > > > > We have various NFS servers that contain the data that our researchers > want to process. These are mounted on our Slurm clusters on well-known > paths. Also, the nodes have local fast scratch disk on another well-known > path. We do not have any distributed file systems in use (Our Slurm > clusters are basically just collections of hetero nodes of differing types, > not a traditional HPC setup by any means.) > > > > In most cases, the researchers can process the data directly off the NFS > mounts without it causing any issues, but in some cases, this slows down > the computation unacceptably. They could manually copy the data to the > local drive using an allocation & srun commands, but I am wondering if > there is a way to do this in sbatch? > > > > I tried this method: > > > > wdennis@submit01 ~> sbatch transfer.sbatch > > Submitted batch job 329572 > > wdennis@submit01 ~> sbatch --dependency=afterok:329572 test_job.sbatch > > Submitted batch job 329573 > > wdennis@submit01 ~> sbatch --dependency=afterok:329573 rm_data.sbatch > > Submitted batch job 329574 > > wdennis@submit01 ~> > > wdennis@submit01 ~> squeue > > JOBID PARTITION NAME USER ST TIME NODES > NODELIST(REASON) > > 329573 gpu wdennis_ wdennis PD 0:00 1 > (Dependency) > > 329574 gpu wdennis_ wdennis PD 0:00 1 > (Dependency) > > 329572 gpu wdennis_ wdennis R 0:23 1 > compute-gpu02 > > > > But it seems to not preserve the node allocated with the --dependency jobs: > > > > > JobID|JobName|User|Partition|NodeList|AllocCPUS|ReqMem|CPUTime|QOS|State|ExitCode|AllocTRES| > > > 329572|wdennis_data_transfer|wdennis|gpu|compute-gpu02|1|2Gc|00:02:01|normal|COMPLETED|0:0|cpu=1,mem=2G,node=1| > > > 329573|wdennis_compute_job|wdennis|gpu|compute-gpu05|1|128Gn|00:03:00|normal|COMPLETED|0:0|cpu=1,mem=128G,node=1,gres/gpu=1| > > > 329574|wdennis_data_removal|wdennis|gpu|compute-gpu02|1|2Gc|00:00:01|normal|COMPLETED|0:0|cpu=1,mem=2G,node=1| > > > > What is the best way to do something like “stage the data on a local path > / run computation using the local copy / remove the locally staged data > when complete”? > > > > Thanks! > > Will >