Based on this bug report SchedMD fixed a X11 forwarding issue in 25.05, maybe this is related and is not fixed after all?
https://support.schedmd.com/show_bug.cgi?id=22034#c6 And the purported fix: https://github.com/SchedMD/slurm/commit/3842c368a439e22a37329e596994f52bda2d0f58 Regards -- Mick Timony Senior DevOps Engineer LASER, Longwood, & O2 Cluster Admin Harvard Medical School -- ________________________________ From: Markus Köberl via slurm-users <[email protected]> Sent: Thursday, July 24, 2025 2:15 AM To: [email protected] <[email protected]>; Patryk Bełzak <[email protected]> Subject: [slurm-users] Re: X11 forwarding broken in 25.05.1 On Wednesday, 23 July 2025 12:19:42 CEST Patryk Bełzak via slurm-users wrote: > Hi, > > we've recentry upgraded our slurm from 24.11.3 to 25.05.1 and it seems that > since the upgrade the ssh X11 forwaring is broken. > > Quick recap - > * on Monday 14'th I performed slurdbd and slurmctld upgrades - X forwarding > was still working * on Tuesday 15'th I performed slurmd upgrades - X > forwarding stopped working > > The issue is very hard to determine and it looks like it sits somhere in > slurm code. You can submit a job with --x11 and it starts corretly. > Xauthority is created, you have all the magic cookies needed, but when you > try to start any application, you get error related to permissions I guess, > see for yourself: > > ``` > me@sand ~ ssh -X -Y ui > [wcss] [email protected]:~ > srun -p lem-cpu-short -A kdm-staff > --gres=storage:local:50G -c 12 --mem 12G -t 1:0:0 --x11 --pty /bin/bash > [wcss] me@r17ch05b01 ~ > xauth list > r17ch05b01.lem.kdm.wcss.pl/unix:91 MIT-MAGIC-COOKIE-1 d82a2efd > [wcss] me@r17ch05b01 ~ > xterm > xterm: Xt error: Can't open display: localhost:91.0 > [wcss] me@r17ch05b01 ~ > date && telnet -4 localhost 6091 || date > Wed Jul 23 12:02:39 CEST 2025 > Trying 127.0.0.1... > Connected to localhost. > Escape character is '^]'. > Connection closed by foreign host. > Wed Jul 23 12:02:41 CEST 2025 > ``` > > As you can see the connection to port is being dropped/killed after a second > or two. Now, it doesn't really matter which flags for ssh you pick (-X or > -Y or both). X forwarding is working when you log in as a regular user > outside of slurm job. Also if I do ssh localhost inside a job, then I can > perform connection to port assigned to $DISPLAY and it isn't dropped - but > it doesn't work since $DISPLAY and cookies are being messed up when you > perform triple jump and one within same host. > > Our worker nodes are mostly on el9.5 AlmaLinux. Some are on el8.10 - and > there acutally you can do some X forwarding but you must use both -X and -Y > (which wasn't the case before slurm upgrade). TLS is disabled in > slurm.conf. I am 100% sure that both SSHD and Xorg are properly configured. > > Has anyone encountered similiar issue? Or any comment from slurm dev team? > > Best regards > Patryk > -- > Wroclaw Centre for Networking and Supercomputing I did create a bug report: https://support.schedmd.com/show_bug.cgi?id=23190 I got the following response per email: Currently, this bug is showing as unsupported in our system. Unsupported bugs are given a very low priority and most times the unsupported bugs are never reviewed by the support team as their focus is on sites with support contracts. If you have a support contract you might rice the priority. regards Markus Köberl -- Markus Koeberl Graz University of Technology Signal Processing and Speech Communication Laboratory E-mail: [email protected]
-- slurm-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
