We are pleased to announce the availability of Slurm version 20.11.5.

This includes a number of moderate severity bug fixes, alongside a new job_container/tmpfs plugin developed by NERSC that can be used to create per-job filesystem namespaces.

Initial documentation for this plugin is available at:
https://slurm.schedmd.com/job_containe.conf.html
Slurm can be downloaded from https://www.schedmd.com/downloads.php .

- Tim

--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support

* Changes in Slurm 20.11.5
========================== > -- Fix main scheduler bug where bf_hetjob_prio truncates
SchedulerParameters.
 -- Fix sacct not displaying UserCPU, SystemCPU and TotalCPU for large times.
 -- scrontab - fix to return the correct index for a bad #SCRON option.
 -- scrontab - fix memory leak when invalid option found in #SCRON line.
 -- Add errno for when a user requests multiple partitions and they are using
    partition based associations.
 -- Fix issue where a job could run in a wrong partition when using
    EnforcePartLimits=any and partition based associations.
 -- Remove possible deadlock when adding associations/wckeys in multiple
    threads.
 -- When using PrologFlags=alloc make sure the correct Slurm version is set
    in the credential.
 -- When sending a job a warning signal make sure we always send SIGCONT
    beforehand.
 -- Fix issue where a batch job would continue running if a prolog failed on a
    node that wasn't the batch host and requeuing was disabled.
 -- Fix issue where sometimes salloc/srun wouldn't get a message about a prolog
    failure in the job's stdout.
 -- Requeue or kill job on a prolog failure when PrologFlags is not set.
 -- Fix race condition causing node reboots to get requeued before
    ResumeTimeout expires.
 -- Preserve node boot_req_time on reconfigure.
 -- Preserve node power_save_req_time on reconfigure.
 -- Fix node reboots being queued and issued multiple times and preventing the
    reboot to time out.
 -- Fix debug message related to GrpTRESRunMin (AssocGrpCPURunMinutesLimit).
 -- Fix run_command to exit correctly if track_script kills the calling thread.
 -- Only requeue a job when the PrologSlurmctld returns nonzero.
 -- When a job is signaled with SIGKILL make sure we flush all
    prologs/setup scripts.
 -- Handle burst buffer scripts if the job is canceled while stage_in is
    happening.
 -- When shutting down the slurmctld make note to ignore error message when
    we have to kill a prolog/setup script we are tracking.
 -- scrontab - add support for the --open-mode option.
 -- acct_gather_profile/influxdb - avoid segfault on plugin shutdown if setup
    has not completed successfully.
 -- Reduce delay in starting salloc allocations when running with prologs.
 -- Fix issue passing open fd's with [send|recv]msg.
 -- Alter AllocNodes check to work if the allocating node's domain doesn't
    match the slurmctld's. This restores the pre-20.11 behavior.
 -- Fix slurmctld segfault if jobs from a prior version had the now-removed
    INVALID_DEPEND state flag set and were allowed to run in 20.11.
 -- Add job_container/tmpfs plugin to give a method to provide a private /tmp
    per job.
 -- Set the correct core affinity when using AutoDetect.
 -- Start relying on the conf again in xcpuinfo_mac_to_abs().
 -- Fix global_last_rollup assignment on job resizing.
 -- slurmrestd - hand over connection context on _on_message_complete().
 -- slurmrestd - mark "environment" as required for job submissions in schema.
 -- slurmrestd - Disable credential reuse on the same TCP connection. Pipelined
    HTTP connections will have to provide authentication with every request.
 -- Avoid data conversion error on NULL strings in data_get_string_converted().
 -- Handle situation where slurmctld is too slow processing
    REQUEST_COMPLETE_BATCH_SCRIPT and it gets resent from the slurmstepd.

Reply via email to