Hi Brian,

Thank you very much! We will try it.

Another thing we have noticed is a massive decrease of slurmctld performance. 
Had to 4x VM’s memory and CPU cores as compared to 24.11, so that 25.05  would 
run  without  freezing.
Does everyone have this , or we did misconfigure some settings of the new RPC 
connection manager?

--
Grigory Shamov
Site Lead / HPC Specialist
University of Manitoba and DRI Alliance Canada


From: Brian Andrus via slurm-users <[email protected]>
Reply-To: Brian Andrus <[email protected]>
Date: Thursday, September 25, 2025 at 6:02 PM
To: "[email protected]" <[email protected]>
Subject: [slurm-users] Re: How to make TLS and PMIx v4 work together?

Caution! This message was sent from outside the University of Manitoba.


Grigory,

You likely need to add your CA to the nodes and update. Under Ubuntu, you would:

  *   Put your CA public key file in /usr/local/share/ca-certificates/
  *   Run /usr/sbin/update-ca-certificates
This should then create a pem file in /etc/ssl/certs for that CA and you can 
then trust certs signed by it.

You will need to do that on all your systems that need to trust your CA.

Brian Andrus


On 9/25/2025 11:11 AM, Grigory Shamov via slurm-users wrote:

Forgot to add: the s2n-tls comes from EPEL and is ver 1.5.10.







On 2025-09-25, 11:56 AM, "Grigory Shamov via slurm-users" 
<[email protected]<mailto:[email protected]> 
<mailto:[email protected]><mailto:[email protected]>> 
wrote:





Caution! This message was sent from outside the University of Manitoba.









Hi All,





We have updated SLURM to the current 25.05.x and tried to enable TLS on it. The 
OS is Alma 8.10, cgroups v1, and PMIx v 4.





We see that srun fails for MPI jobs across the nodes, with TLS related errors 
when using PMIx (the default) but passes with srun --mpi=pmi2 or with mpirun .





TLSType = tls/s2n

TLSParameters = ca_cert_file= (has all the certs here under /etc/slurm/certs)





And the errors when using PMIx are





025-09-25T11:04:43.894] error: con_close_on_poll_error: [n388:6818(fd:15)] 
socket error encountered while polling: Connection reset by peer

[2025-09-25T11:04:50.102] [6451416.0] error: _negotiate: s2n_negotiate() failed 
S2N_ERR_CERT_UNTRUSTED[335544366]: Certificate is untrusted -> Error 
encountered in /builddir/build/BUILD/s2n-tls-1.5.10/tls/s2n_x509_validator.c:494

(couple of these)

[2025-09-25T11:05:57.878] [6451416.0] error: tls_p_recv: s2n_recv() failed 
S2N_ERR_CLOSED[134217728]: connection is closed -> Error encountered in 
/builddir/build/BUILD/s2n-tls-1.5.10/utils/s2n_io.c:37

[2025-09-25T11:05:57.883] [6451416.0] error: tls_p_send: s2n_send() failed 
S2N_ERR_IO[67108864]: underlying I/O operation failed, check system errno -> 
Error encountered in /builddir/build/BUILD/s2n-tls-1.5.10/utils/s2n_io.c:28

(couple of these)

[2025-09-25T11:05:59.076] error: wrap_on_data: 
[unix:/var/spool/slurmd/slurmd.socket(fd:17)] on_data returned rc: Unable to 
proxy slurmstepd message

[2025-09-25T11:05:59.076] [6451416.0] error: _stepd_send_recv_msg: slurmd was 
unable to proxy request message to its final destination

[2025-09-25T11:05:59.878] error: _slurmd_send_recv_msg: Failed to send/recv 
slurmstepd message MESSAGE_TASK_EXIT using proxy_type PROXY_TO_NODE_SEND_RECV





2025-09-25T11:07:36.335] [6451416.0] error: mpi/pmix_v4: pmixp_p2p_send: n388 
[0]: pmixp_utils.c:469: send failed, rc=1001, exceeded the retry limit

[2025-09-25T11:07:36.335] [6451416.0] error: mpi/pmix_v4: _slurm_send: n388 
[0]: pmixp_server.c:1586: Cannot send message to 
/var/spool/slurmd/stepd.slurm.pmix.6451416.0, size = 27679, hostlist:

(null)

(and couple more PMIx errors). Looks like PMIx cannot talk to their peers now ?





There was no specific configuration for the certgen plugin, because SLURM 
documentation seems to say it is optional(?).





I wonder what do we miss here to have SLURM 25.05 in with TLS enabled and PMIx 
working? Any advice appreciated! Thanks!





--

Grigory Shamov

Site Lead / HPC Specialist

University of Manitoba and DRI Alliance Canada













--

slurm-users mailing list -- 
[email protected]<mailto:[email protected]> 
<mailto:[email protected]><mailto:[email protected]>

To unsubscribe send an email to 
[email protected]<mailto:[email protected]> 
<mailto:[email protected]><mailto:[email protected]>








-- 
slurm-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to