On Mar 8, 2021, at 1:35 PM, slurm-users-requ...@lists.schedmd.com<mailto:slurm-users-requ...@lists.schedmd.com> wrote:
What?s happening is that there?s no SLURM_JOBID (my speculation since I don?t have perms to use ?no-alloc) is set, but SLURM_NODELIST may be set, so its confusing ORTE. Could you list which SLURM env variables are set in the shell in which your running the srun command? Howard, I believe you are correct. Once I set SLURM_JOBID then ORTE starts functioning again with the --no-alloc option. Since you asked (and for completeness) I include the list of environment variables that were different with/without --no-alloc below, but my tests show that jobid seems to be the magic one, as you predicted. I guess I will manufacture an artificial job id for our “--no-alloc” runs, but if anyone is aware of any dangers lurking in the shadows from that approach I would be interested. Thanks for the guidance ... impressive that you could identify the issue so quickly! chris ---------------------------------------------------------- SLURM_JOB_CPUS_PER_NODE=1 SLURM_JOB_ID=25300 SLURM_JOBID=25300 SLURM_JOB_NUM_NODES=1 SLURM_JOB_PARTITION=psfehq SLURM_JOB_QOS=normal SLURM_CPUS_ON_NODE=1