We are pleased to announce the availability of Slurm version 22.05.4.

This includes fixes to two potential crashes in the backfill scheduler, alongside a number of other moderate severity issues.

Slurm can be downloaded from https://www.schedmd.com/downloads.php .

- Tim

--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support

* Changes in Slurm 22.05.4
==========================
 -- Fix return code from salloc when the job is revoked prior to executing user
    command.
 -- Fix minor memory leak when dealing with gres with multiple files.
 -- Fix printing for no_consume gres in scontrol show job.
 -- sinfo - Fix truncation of very large values when outputting memory.
 -- Fix multi-node step launch failure when nodes in the controller aren't in
    natural order. This can happen with inconsistent node naming (such as
    node15 and node052) or with dynamic nodes which can register in any order.
 -- job_container/tmpfs - Prevent reading the plugin config multiple times per
    step.
 -- Fix wrong attempt of gres binding for gres w/out cores defined.
 -- Fix build to work with '--without-shared-libslurm' configure flag.
 -- Fix power_save mode when repeatedly configuring too fast.
 -- Fix sacct -I option.
 -- Prevent jobs from being scheduled on future nodes.
 -- Fix memory leak in slurmd happening on reconfigure when CPUSpecList used.
 -- Fix sacctmgr show event [min|max]cpus.
 -- Fix regression in 22.05.0rc1 where a prolog or epilog that redirected stdout
    to a file could get erroneously killed, resulting in job launch failure
    (for the prolog) and the node being drained.
 -- cgroup/v1 - Make a static variable to remove potential redundant checking
    for if the system has swap or not.
 -- cgroup/v1 - Add check for swap when running OOM check after task
    termination.
 -- job_submit/lua - add --prefer support
 -- cgroup/v1 - fix issue where sibling steps could incorrectly be accounted as
    OOM when step memory limit was the same as the job allocation. Detect OOM
    events via memory.oom_control oom_kill when exposed by the kernel instead of
    subscribing notifications with eventfd.
 -- Fix accounting of oom_kill events in cgroup/v2 and task/cgroup.
 -- Fix segfault when slurmd reports less than configured gres with links after
    a slurmctld restart.
 -- Fix TRES counts after node is deleted using scontrol.
 -- sched/backfill - properly handle multi-reservation HetJobs.
 -- sched/backfill - don't try to start HetJobs after system state change.
 -- openapi/v0.0.38 - add submission of job->prefer value.
 -- slurmdbd - become SlurmUser at the same point in logic as slurmctld to match
    plugins initialization behavior. This avoids a fatal error when starting
    slurmdbd as root and root cannot start the auth or accounting_storage
    plugins (for example, if root cannot read the jwt key).
 -- Fix memory leak when attempting to update a job's features with invalid
    features.
 -- Fix occasional slurmctld crash or hang in backfill due to invalid pointers.
 -- Fix segfault on Cray machines if cgroup cpuset is used in cgroup/v1.

Reply via email to