[slurm-users] Re: Run a command in Slurm with all streams and signals connected to the submitting command
Thanks Davide, It's true that srun will create an allocation if you aren't inside a job, but if you are inside a job and you request more resources than it has, then srun will just fail. This is the key issue that I want to avoid. On Sat, Apr 5, 2025 at 11:48 AM Davide DelVento wrote: > The plain srun is probably the best bet, and if you really need the thing > to be started from another slurm job (rather than the login node) you will > need to exploit the fact that > > > If necessary, srun will first create a resource allocation in which to > run the parallel job. > > AFAIK, there is no option to for the "create a resource allocation" even > if it's not necessary. But you may try to request something that is "above > and beyond" what the current allocation provides, and that might solve your > problem. > Looking at the srun man page, I could speculate that --clusters > or --cluster-constraint might help in that regard (but I am not sure). > > Have a nice weekend > > > On Fri, Apr 4, 2025 at 6:27 AM Michael Milton via slurm-users < > slurm-users@lists.schedmd.com> wrote: > >> I'm helping with a workflow manager that needs to submit Slurm jobs. For >> logging and management reasons, the job (e.g. srun python) needs to be run >> as though it were a regular subprocess (python): >> >>- stdin, stdout and stderr for the command should be connected to >>process inside the job >>- signals sent to the command should be sent to the job process >>- We don't want to use the existing job allocation, if this is run >>from a Slurm job >>- The command should only terminate when the job is finished, to >>avoid us needing to poll Slurm >> >> We've tried: >> >>- sbatch --wait, but then SIGTERM'ing the process doesn't kill the job >>- salloc, but that requires a TTY process to control it (?) >>- salloc srun seems to mess with the terminal when it's killed, >>likely because of being "designed to be executed in the foreground" >>- Plain srun re-uses the existing Slurm allocation, and specifying >>resources like --mem will just request then from the current job rather >>than submitting a new one >> >> What is the best solution here? >> >> -- >> slurm-users mailing list -- slurm-users@lists.schedmd.com >> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com >> > -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: Run a command in Slurm with all streams and signals connected to the submitting command
Thanks Chris, I can verify that unsetting all these environment variables does allow you to `srun --mem 5G` within an `srun --mem 3G` (etc). I will see if this solves my problem. Interestingly just by running ` unset SLURM_CPU_BIND SLURM_JOB_ID` I can get it working. SLURM_JOB_ID seems to be the variable that controls whether srun is inside the same job or not. Unsetting SLURM_CPU_BIND is needed to avoid "CPU binding outside of job step allocation". Cheers On Sat, Apr 5, 2025 at 3:39 PM Chris Samuel via slurm-users < slurm-users@lists.schedmd.com> wrote: > On 4/4/25 5:23 am, Michael Milton via slurm-users wrote: > > > Plain srun re-uses the existing Slurm allocation, and specifying > > resources like --mem will just request then from the current job rather > > than submitting a new one > > srun does that as it sees all the various SLURM_* environment variables > in the environment of the running job. My bet would be that if you > eliminated them from the environment of the srun then you would get a > new allocation. > > I've done similar things in the past to do an sbatch for a job that > wants to run on very different hardware with: > > env $(env | awk -F= '/^(SLURM|SBATCH)/ {print "-u",$1}' | paste -s -d\ ) > sbatch [...] > > So it could be worth substituting srun for sbatch there and see if that > helps. > > Best of luck! > Chris > -- > Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com > -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Run a command in Slurm with all streams and signals connected to the submitting command
I'm helping with a workflow manager that needs to submit Slurm jobs. For logging and management reasons, the job (e.g. srun python) needs to be run as though it were a regular subprocess (python): - stdin, stdout and stderr for the command should be connected to process inside the job - signals sent to the command should be sent to the job process - We don't want to use the existing job allocation, if this is run from a Slurm job - The command should only terminate when the job is finished, to avoid us needing to poll Slurm We've tried: - sbatch --wait, but then SIGTERM'ing the process doesn't kill the job - salloc, but that requires a TTY process to control it (?) - salloc srun seems to mess with the terminal when it's killed, likely because of being "designed to be executed in the foreground" - Plain srun re-uses the existing Slurm allocation, and specifying resources like --mem will just request then from the current job rather than submitting a new one What is the best solution here? -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com