We are pleased to announce the availability of Slurm version 21.08.6.

This includes a number of fixes since the last maintenance release was made in December, including an import fix to a regression seen when using the 'mpirun' command within a job script.

Slurm can be downloaded from https://www.schedmd.com/downloads.php .

- Tim

--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support

* Changes in Slurm 21.08.6
==========================
 -- Handle typed shared GRES better in accounting.
 -- Fix plugin_name definitions in a number of plugins to improve logging.
 -- Close sbcast file transfers when job is cancelled.
 -- job_submit/lua - allow mail_type and mail_user fields to be modified.
 -- scrontab - fix handling of --gpus and --ntasks-per-gpu options.
 -- sched/backfill - fix job_queue_rec_t memory leak.
 -- Fix magnetic reservation logic in both main and backfill schedulers.
 -- job_container/tmpfs - fix memory leak when using InitScript.
 -- slurmrestd / openapi - fix memory leaks.
 -- Fix slurmctld segfault due to job array resv_list double free.
 -- Fix multi-reservation job testing logic.
 -- Fix slurmctld segfault due to insufficient job reservation parse validation.
 -- Fix main and backfill schedulers handling for already rejected job array.
 -- sched/backfill - restore resv_ptr after yielding locks.
 -- acct_gather_energy/xcc - appropriately close and destroy the IPMI context.
 -- Protect slurmstepd from making multiple calls to the cleanup logic.
 -- Prevent slurmstepd segfault at cleanup time in mpi_fini().
 -- Fix slurmctld sometimes hanging if shutdown while PrologSlurmctld or
    EpilogSlurmctld were running and PrologEpilogTimeout is set in slurm.conf.
 -- Fix affinity of the batch step if batch host is different than the first
    node in the allocation.
 -- slurmdbd - fix segfault after multiple failover/failback operations.
 -- Fix jobcomp filetxt job selection condition.
 -- Fix -f flag of sacct not being used.
 -- Select cores for job steps according to the socket distribution. Previously,
    sockets were always filled before selecting cores from the next socket.
 -- Keep node in Future state if epilog completes while in Future state.
 -- Fix erroneous --constraint behavior by preventing multiple sets of brackets.
 -- Make ResetAccrueTime update the job's accrue_time to now.
 -- Fix sattach initialization with configless mode.
 -- Revert packing limit checks affecting pmi2.
 -- sacct - fixed assertion failure when using -c option and a federation
    display
 -- Fix issue that allowed steps to overallocate the job's memory.
 -- Fix the sanity check mode of AutoDetect so that it actually works.
 -- Fix deallocated nodes that didn't actually launch a job from waiting for
    Epilogslurmctld to complete before clearing completing node's state.
 -- Job should be in a completing state if EpilogSlurmctld when being requeued.
 -- Fix job not being requeued properly if all node epilog's completed before
    EpilogSlurmctld finished.
 -- Keep job completing until EpilogSlurmctld is completed even when "downing"
    a node.
 -- Fix handling reboot with multiple job features.
 -- Fix nodes getting powered down when creating new partitions.
 -- Fix bad bit_realloc which potentially could lead to bad memory access.
 -- slurmctld - remove limit on the number of open files.
 -- Fix bug where job_state file of size above 2GB wasn't saved without any
    error message.
 -- Fix various issues with no_consume gres.
 -- Fix regression in 21.08.0rc1 where job steps failed to launch on systems
    that reserved a CPU in a cgroup outside of Slurm (for example, on systems
    with WekaIO).
 -- Fix OverTimeLimit not being reset on scontrol reconfigure when it is
    removed from slurm.conf.
 -- serializer/yaml - use dynamic buffer to allow creation of YAML outputs
    larger than 1MiB.
 -- Fix minor memory leak affecting openapi users at process termination.
 -- Fix batch jobs not resolving the username when nss_slurm is enabled.
 -- slurmrestd - Avoid slurmrestd ignoring invalid HTTP method if the response
    serialized without error.
 -- openapi/dbv0.0.37 - Correct conditional that caused the diag output to
    give an internal server error status on success.
 -- Make --mem-bind=sort work with task_affinity
 -- Fix sacctmgr to set MaxJobsAccruePer{User|Account} and MinPrioThres in
    sacctmgr add qos, modify already worked correctly.
 -- job_container/tmpfs - avoid printing extraneous error messages in Prolog
    and Epilog, and when the job completes.
 -- Fix step CPU memory allocation with --threads-per-core without --exact.
 -- Remove implicit --exact when --threads-per-core or --hint=nomultithread
    is used.
 -- Do not allow a step to request more threads per core than the
    allocation did.
 -- Remove implicit --exact when --cpus-per-task is used.

Reply via email to