Re: [slurm-users] ticking time bomb? launching too many jobs in parallel

Jarno van der Kolk Thu, 29 Aug 2019 11:32:36 -0700

On 8/29/19 12:48 PM, Goetz, Patrick G wrote:
> On 8/29/19 9:38 AM, Jarno van der Kolk wrote:
> > Here's an example on how to do so from the Compute Canada docs:
> > 
> https://docs.computecanada.ca/wiki/GNU_Parallel#Running_on_Multiple_Nodes
> >
> 
> [name@server ~]$ parallel --jobs 32 --sshloginfile
> ./node_list_${SLURM_JOB_ID} --env MY_VARIABLE --workdir $PWD ./my_program
> 
> 
> To me it looks like you're circumventing the scheduler when you do this;
> maybe I'm missing something?
> 
> Also, where are these environment variables:
> 
>    SLURM_JOB_NODELIST, SLURM_JOB_ID
> 
> being set?
>


I guess you kind of are. The advantage of this over array jobs is that you can 
provide a list of jobs instead on depending on SLURM_ARRAY_TASK_ID while still 
only doing one submission to the scheduler. So instead of submitting hundreds 
or even thousands of little jobs and waiting for the scheduler to accept them 
all, you submit once and are done. So parallel functions as a subscheduler if 
you will.

Those environment variables are set when the job starts.
See also 
https://slurm.schedmd.com/sbatch.html#SECTION_OUTPUT-ENVIRONMENT-VARIABLES

Regards,
Jarno

Jarno van der Kolk, PhD Phys.
Analyste principal en informatique scientifique | Senior Scientific Computing 
Specialist
Solutions TI | IT Solutions
Université d’Ottawa | University of Ottawa

Re: [slurm-users] ticking time bomb? launching too many jobs in parallel

Reply via email to