Restarting slurmd should be fine assuming they come back before the communications time out.  I restart slurmd's all the time and haven't had any real problems.

-Paul Edmon-


On 7/27/2018 6:51 PM, Chris Harwell wrote:
Ot is possible, but double check your config for timeouts first.

On Fri, Jul 27, 2018, 15:31 Prentice Bisbal <pbis...@pppl.gov <mailto:pbis...@pppl.gov>> wrote:

    Slurm-users,

    I'm still learning Slurm, so I have what I think is a basic question.
    Can you restart slurmd on nodes where jobs are running, or will that
    kill the jobs? I ran into the same problem as described here:

    https://bugs.schedmd.com/show_bug.cgi?id=3535

    I believe the best way to fix this is to restart slurmd on all my
    nodes,
    but I've been unable to determine conclusively whether I can do
    that w/o
    killing running jobs. I've spent some time googling this, but
    couldn't
    find a definitive answer one way or the other. I prefer to not kill a
    bunch of user jobs on a Friday afternoon.

-- Prentice


--
Chris Harwell

Reply via email to