Hi all, We have various NFS servers that contain the data that our researchers want to process. These are mounted on our Slurm clusters on well-known paths. Also, the nodes have local fast scratch disk on another well-known path. We do not have any distributed file systems in use (Our Slurm clusters are basically just collections of hetero nodes of differing types, not a traditional HPC setup by any means.)
In most cases, the researchers can process the data directly off the NFS mounts without it causing any issues, but in some cases, this slows down the computation unacceptably. They could manually copy the data to the local drive using an allocation & srun commands, but I am wondering if there is a way to do this in sbatch? I tried this method: wdennis@submit01 ~> sbatch transfer.sbatch Submitted batch job 329572 wdennis@submit01 ~> sbatch --dependency=afterok:329572 test_job.sbatch Submitted batch job 329573 wdennis@submit01 ~> sbatch --dependency=afterok:329573 rm_data.sbatch Submitted batch job 329574 wdennis@submit01 ~> wdennis@submit01 ~> squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 329573 gpu wdennis_ wdennis PD 0:00 1 (Dependency) 329574 gpu wdennis_ wdennis PD 0:00 1 (Dependency) 329572 gpu wdennis_ wdennis R 0:23 1 compute-gpu02 But it seems to not preserve the node allocated with the --dependency jobs: JobID|JobName|User|Partition|NodeList|AllocCPUS|ReqMem|CPUTime|QOS|State|ExitCode|AllocTRES| 329572|wdennis_data_transfer|wdennis|gpu|compute-gpu02|1|2Gc|00:02:01|normal|COMPLETED|0:0|cpu=1,mem=2G,node=1| 329573|wdennis_compute_job|wdennis|gpu|compute-gpu05|1|128Gn|00:03:00|normal|COMPLETED|0:0|cpu=1,mem=128G,node=1,gres/gpu=1| 329574|wdennis_data_removal|wdennis|gpu|compute-gpu02|1|2Gc|00:00:01|normal|COMPLETED|0:0|cpu=1,mem=2G,node=1| What is the best way to do something like “stage the data on a local path / run computation using the local copy / remove the locally staged data when complete”? Thanks! Will