Re: [slurm-users] ticking time bomb? launching too many jobs in parallel

Mark Hahn Thu, 29 Aug 2019 20:32:11 -0700

Here's an example on how to do so from the Compute Canada docs:
https://docs.computecanada.ca/wiki/GNU_Parallel#Running_on_Multiple_Nodes


[name@server ~]$ parallel --jobs 32 --sshloginfile
./node_list_${SLURM_JOB_ID} --env MY_VARIABLE --workdir $PWD ./my_program

To me it looks like you're circumventing the scheduler when you do this;
maybe I'm missing something?

our (ComputeCanada) setup includes slurm_adopt, so if a user sshes to anode on which they have resources, any processes get put into the job'scgroup. we don't really care how the user consumes the resources, as long

as it's only what's allocated to their jobs, doesn't interfere with other
users, and is hopefully reasonably efficient.  heck, we configure clusters
with hostbased trust, so it's easy for users to ssh among nodes.

regards,
--
Mark Hahn | SHARCnet Sysadmin | h...@sharcnet.ca | http://www.sharcnet.ca
          | McMaster RHPCS    | h...@mcmaster.ca | 905 525 9140 x24687
          | Compute/Calcul Canada                | http://www.computecanada.ca

Re: [slurm-users] ticking time bomb? launching too many jobs in parallel

Reply via email to