[slurm-users] Re: Run a command in Slurm with all streams and signals connected to the submitting command

2025-04-04 Thread Michael Milton via slurm-users
Thanks Davide,

It's true that srun will create an allocation if you aren't inside a job,
but if you are inside a job and you request more resources than it has,
then srun will just fail. This is the key issue that I want to avoid.

On Sat, Apr 5, 2025 at 11:48 AM Davide DelVento 
wrote:

> The plain srun is probably the best bet, and if you really need the thing
> to be started from another slurm job (rather than the login node) you will
> need to exploit the fact that
>
> > If necessary, srun will first create a resource allocation in which to
> run the parallel job.
>
> AFAIK, there is no option to for the "create a resource allocation" even
> if it's not necessary. But you may try to request something that is "above
> and beyond" what the current allocation provides, and that might solve your
> problem.
> Looking at the srun man page, I could speculate that --clusters
> or --cluster-constraint might help in that regard (but I am not sure).
>
> Have a nice weekend
>
>
> On Fri, Apr 4, 2025 at 6:27 AM Michael Milton via slurm-users <
> slurm-users@lists.schedmd.com> wrote:
>
>> I'm helping with a workflow manager that needs to submit Slurm jobs. For
>> logging and management reasons, the job (e.g. srun python) needs to be run
>> as though it were a regular subprocess (python):
>>
>>- stdin, stdout and stderr for the command should be connected to
>>process inside the job
>>- signals sent to the command should be sent to the job process
>>- We don't want to use the existing job allocation, if this is run
>>from a Slurm job
>>- The command should only terminate when the job is finished, to
>>avoid us needing to poll Slurm
>>
>> We've tried:
>>
>>- sbatch --wait, but then SIGTERM'ing the process doesn't kill the job
>>- salloc, but that requires a TTY process to control it (?)
>>- salloc srun seems to mess with the terminal when it's killed,
>>likely because of being "designed to be executed in the foreground"
>>- Plain srun re-uses the existing Slurm allocation, and specifying
>>resources like --mem will just request then from the current job rather
>>than submitting a new one
>>
>> What is the best solution here?
>>
>> --
>> slurm-users mailing list -- slurm-users@lists.schedmd.com
>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>>
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Run a command in Slurm with all streams and signals connected to the submitting command

2025-04-06 Thread Michael Milton via slurm-users
Thanks Chris,

I can verify that unsetting all these environment variables does allow you
to `srun --mem 5G` within an `srun --mem 3G` (etc). I will see if this
solves my problem.

Interestingly just by running ` unset SLURM_CPU_BIND SLURM_JOB_ID` I can
get it working. SLURM_JOB_ID seems to be the variable that controls whether
srun is inside the same job or not. Unsetting SLURM_CPU_BIND is needed to
avoid "CPU binding outside of job step allocation".

Cheers

On Sat, Apr 5, 2025 at 3:39 PM Chris Samuel via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> On 4/4/25 5:23 am, Michael Milton via slurm-users wrote:
>
> > Plain srun re-uses the existing Slurm allocation, and specifying
> > resources like --mem will just request then from the current job rather
> > than submitting a new one
>
> srun does that as it sees all the various SLURM_* environment variables
> in the environment of the running job. My bet would be that if you
> eliminated them from the environment of the srun then you would get a
> new allocation.
>
> I've done similar things in the past to do an sbatch for a job that
> wants to run on very different hardware with:
>
> env $(env | awk -F= '/^(SLURM|SBATCH)/ {print "-u",$1}' | paste -s -d\ )
> sbatch [...]
>
> So it could be worth substituting srun for sbatch there and see if that
> helps.
>
> Best of luck!
> Chris
> --
> Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Run a command in Slurm with all streams and signals connected to the submitting command

2025-04-04 Thread Michael Milton via slurm-users
I'm helping with a workflow manager that needs to submit Slurm jobs. For
logging and management reasons, the job (e.g. srun python) needs to be run
as though it were a regular subprocess (python):

   - stdin, stdout and stderr for the command should be connected to
   process inside the job
   - signals sent to the command should be sent to the job process
   - We don't want to use the existing job allocation, if this is run from
   a Slurm job
   - The command should only terminate when the job is finished, to avoid
   us needing to poll Slurm

We've tried:

   - sbatch --wait, but then SIGTERM'ing the process doesn't kill the job
   - salloc, but that requires a TTY process to control it (?)
   - salloc srun seems to mess with the terminal when it's killed, likely
   because of being "designed to be executed in the foreground"
   - Plain srun re-uses the existing Slurm allocation, and specifying
   resources like --mem will just request then from the current job rather
   than submitting a new one

What is the best solution here?

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com