Typo on my part. Yes, it is SIGSTOP, which cannot be caught..

Maybe you could,  "pre-signal" any running job with SIGUSR1 before sending the suspend command. At least, if you are manually suspending the job(s). That could be caught and acted upon before the SIGSTOP was received.

Brian


On 10/15/2019 10:52 PM, Oytun Peksel wrote:

Brian,

Thanks for your response. I am looking into that option. I am a bit confused about which signal is sent though. I thought it was SIGSTOP not SIGSTP. And I read you can’t really catch and stop SIGSTOP or SIGCONT signals but I am not very good at sys admin stuff anyway.

So in the end, these feel like dirty tricks to me. The select/* plugins should have  mechanisms to run scripts and such before sending signals. But apparently there is no such mechanism.

So probably I will dig deeper into what you suggested.

Thanks



*Oytun Peksel*

oytun.pek...@semcon.com <mailto:oytun.pek...@semcon.com>

        
        

+46739205917

        
        

*From:*slurm-users <slurm-users-boun...@lists.schedmd.com> *On Behalf Of *Brian Andrus
*Sent:* den 15 oktober 2019 20:58
*To:* slurm-users@lists.schedmd.com
*Subject:* Re: [slurm-users] Execute scripts on suspend and cancel

It seems that there are some details that would need addressed.

A suspend signal is nothing more than sending a SIGSTP (like hitting ctrl-s), so the application is still in memory awaiting SIGCONT

So what should happen when it continues and there are no more licenses? So the proper place for what you are looking for is in the application itself. If it is given a SIGSTP, it could release the licenses and then check them out again when SIGCONT is received.

If you are able to tell your app to release/request a license externally, you may want to have a wrapper to do the signal handling until they have it as part of their app.

Brian Andrus

On 10/14/2019 4:40 AM, Oytun Peksel wrote:

    It is quite weird if slurm has no mechanism as described. I have
    been digging more into it and someone suggested a workaround using
    mail notifications. You use a script instead of the mail
    application and catch the event then use use sacct to see what is
    happening.

    Two problems with this:

    ·There is no mail sent with suspended preemption

    ·If you use requeue instead there will be a mail event and you can
    catch it. Sacct will flag it as “preempted” so you know it is
    requeued. But then it would change it pending. So you really need
    to be quick to catch it. Also there is no distinctive flag for
    resuming.

    Anyone has any other method to execute scripts during preemption?




    *Oytun Peksel*

    oytun.pek...@semcon.com <mailto:oytun.pek...@semcon.com>

        
        

    +46739205917

        
        

    *From:*slurm-users <slurm-users-boun...@lists.schedmd.com>
    <mailto:slurm-users-boun...@lists.schedmd.com>*On Behalf Of *Oytun
    Peksel
    *Sent:* den 11 oktober 2019 09:10
    *To:* slurm-users@lists.schedmd.com
    <mailto:slurm-users@lists.schedmd.com>
    *Subject:* [slurm-users] Execute scripts on suspend and cancel

    Hi,

    I was wondering is there an option in Slurm to execute custom
    scripts before Suspend signal.  What I need to do is to tell an
    application to release it’s licenses before sending the suspend
    signal during preemption. I think went through all the
    documentation but could not find a mechanism like this.

    BR

    /Oytun



    /When you communicate with us or otherwise interact with Semcon,
    we will process personal data that you provide to us or we collect
    about you, please read more in our //Privacy Policy/
    <https://semcon.com/data-privacy-policy/>/./

Reply via email to