Re: [slurm-users] Slurm version 22.05.6 is now available

Ole Holm Nielsen Thu, 10 Nov 2022 23:09:30 -0800

FYI: The Slurm download page is as usual:https://www.schedmd.com/downloads.php


/Ole


On 11/10/22 22:49, Marshall Garey wrote:

We are pleased to announce the availability of Slurm version 22.05.6.

This includes a fix to core selection for steps which could result inrandom task launch failures, alongside a number of other moderate severityissues.


- Marshall

--
Marshall Garey
Release Management, Support, and Development
SchedMD LLC - Commercial Slurm Development and Support

* Changes in Slurm 22.05.6
==========================

-- Fix a partition's DisableRootJobs=no from preventing root jobs fromworking. -- Fix the number of allocated cpus for an auto-adjustment case inwhich the

    job requests --ntasks-per-node and --mem (per-node) but the limit is
    MaxMemPerCPU.
 -- Fix POWER_DOWN_FORCE request leaving node in completing state.

-- Do not count magnetic reservation queue records towards backfilllimits.

 -- Clarify error message when --send-libs=yes or BcastParameters=send_libs
    fails to identify shared library files, and avoid creating an empty
    "<filename>_libs" directory on the target filesystem.
 -- Fix missing CoreSpec on dynamic nodes upon slurmctld restart.
 -- Fix node state reporting when using specialized cores.
 -- Fix number of CPUs allocated if --cpus-per-gpu used.
 -- Add flag ignore_prefer_validation to not validate --prefer on a job.

-- Fix salloc/sbatch SLURM_TASKS_PER_NODE output environment variablewhen the

    number of tasks is not requested.
 -- Permit using wildcard magic cookies with X11 forwarding.
 -- cgroup/v2 - Add check for swap when running OOM check after task
    termination.
 -- Fix deadlock caused by race condition when disabling power save with a
    reconfigure.
 -- Fix memory leak in the dbd when container is sent to the database.
 -- openapi/dbv0.0.38 - correct dbv0.0.38_tres_info.
 -- Fix node SuspendTime, SuspendTimeout, ResumeTimeout being updated after
    altering partition node lists with scontrol.
 -- jobcomp/elasticsearch - fix data_t memory leak after serialization.
 -- Fix issue where '*' wasn't accepted in gpu/cpu bind.
 -- Fix SLURM_GPUS_ON_NODE for shared GPU gres (MPS, shards).
 -- Add SLURM_SHARDS_ON_NODE environment variable for shards.
 -- Fix srun error with overcommit.
 -- Fix bug in core selection for the default cyclic distribution of tasks
    across sockets, that resulted in random task launch failures.
 -- Fix core selection for steps requesting multiple tasks per core when
    allocation contains more cores than required for step.
 -- gpu/nvml - Fix MIG minor number generation when GPU minor number

(/dev/nvidia[minor_number]) and index (as seen in nvidia-smi) do notmatch.

 -- Fix accrue time underflow errors after slurmctld reconfig or restart.

-- Surpress errant errors from prolog_complete about being unable tolocate

    "node:(null)".
 -- Fix issue where shards were selected from multiple gpus and failed to
    allocate.
 -- Fix step cpu count calculation when using --ntasks-per-gpu=.

-- Fix overflow problems when validating array index parameters inslurmctld

    and prevent a potential condition causing slurmctld to crash.

-- Remove dependency on json-c in slurmctld when running with powersaving.

    Only the new "SLURM_RESUME_FILE" support relies on this, and it will be
    disabled if json-c support is unavailable instead.

Re: [slurm-users] Slurm version 22.05.6 is now available

Reply via email to