I’ve managed to track down the difference between the accounts which work and 
those which don’t – but I still don’t understand the mechanism.

The accounts which work all had their home directories used on an older system. 
 The ones which fail were only ever used on the new system.  The relevant 
difference seems to be the way their ssh keys are set up.  On the old system a 
standard ssh-keygen was run, creating ~/.ssh/id_rsa and ~/.ssh/id_rsa.pub files 
and putting the pub file into authorized_keys.

On the new warewulf based system ssh-keygen was again run, but the default key 
file names was changed.  We now have ~/.ssh/cluster and ~/.ssh/cluster.pub and 
there is a ~/.ssh/config file which contains:

# Added by Warewulf  2019-12-10
Host pebble*
   IdentityFile ~/.ssh/cluster
   StrictHostKeyChecking=no

This all works fine, and I can ssh from the head node to the ‘pebble’ compute 
nodes just fine, however something in the code for the slurm x11 forwarder is 
specifically looking for id_rsa files (or is ignoring the config file), since 
the forwarding fails if I don’t have these, and works as soon as I do.

Any ideas where this might be happening so I can either file a bug for change 
whatever setting this needs?

Simon.

From: slurm-users <slurm-users-boun...@lists.schedmd.com> On Behalf Of William 
Brown
Sent: 24 January 2020 17:21
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: Re: [slurm-users] Srun not setting DISPLAY with --x11 for one account

There are differences for X11 between Slurm versions so it may help to know 
which version you have.

I tried some of your commands on our slurm 19.05.3-2 cluster, and interestingly 
on the session on the compute node I don't see the cookie for the login node:  
This was with MobaXterm:

[user@prdubrvm005 ~]$ xauth list
prdubrvm005.research.rcsi.com/unix:10<http://prdubrvm005.research.rcsi.com/unix:10>
  MIT-MAGIC-COOKIE-1  2efc5dd851736e3848193f65d038eca8
[user@prdubrvm005 ~]$ srun --pty  --x11  --preserve-env /bin/bash
[user@prdubrhpc1-02 ~]$ xauth list
prdubrhpc1-02.research.rcsi.com/unix:95<http://prdubrhpc1-02.research.rcsi.com/unix:95>
  MIT-MAGIC-COOKIE-1  2efc5dd851736e3848193f65d038eca8
[user@prdubrhpc1-02 ~]$ echo $DISPLAY
localhost:95.0

Any per-user problem would make me suspect the user having a different shell, 
or something in their login script.  Can you make their .bashrc and 
.bash_profile just exit?  Or look for hidden configuration files for 
<something> in their home directory?

William



On Fri, 24 Jan 2020 at 16:05, Simon Andrews 
<simon.andr...@babraham.ac.uk<mailto:simon.andr...@babraham.ac.uk>> wrote:
I have a weird problem which I can’t get to the bottom of.

We have a cluster which allows users to start interactive sessions which 
forward any X11 sessions they generated on the head node.  This generally works 
fine, but on the account of one user it doesn’t work.  The X11 connection to 
the head node is fine, but it won’t transfer to the compute node.

The symptoms are shown below:

A good user gets this:

[good@headnode ~]$ xauth list
headnode.babraham.ac.uk/unix:12<http://headnode.babraham.ac.uk/unix:12>  
MIT-MAGIC-COOKIE-1  f04a2bf9a921a3357e44373655add14a

[good@headnode ~]$ echo $DISPLAY
localhost:12.0

[good@headnode ~]$ srun --pty -p interactive --x11  --preserve-env /bin/bash

[good@compute ~]$ xauth list
headnode.babraham.ac.uk/unix:12<http://headnode.babraham.ac.uk/unix:12>  
MIT-MAGIC-COOKIE-1  f04a2bf9a921a3357e44373655add14a
compute/unix:25  MIT-MAGIC-COOKIE-1  f04a2bf9a921a3357e44373655add14a

[good@compute ~]$ echo $DISPLAY
localhost:25.0

So the cookie is copied from the head node and forwarded and the DISPLAY 
variable is updated.

The bad user gets this:

[bad@headnode ~]$ xauth list
headnode.babraham.ac.uk/unix:10<http://headnode.babraham.ac.uk/unix:10>  
MIT-MAGIC-COOKIE-1  c39a493a37132d308b37469d363d8692

[bad@headnode ~]$ echo $DISPLAY
localhost:10.0

[bad@headnode ~]$ srun --pty -p interactive --x11  --preserve-env /bin/bash

[bad@compute ~]$ xauth list
headnode.babraham.ac.uk/unix:10<http://headnode.babraham.ac.uk/unix:10>  
MIT-MAGIC-COOKIE-1  c39a493a37132d308b37469d363d8692

[bad@compute ~]$ echo $DISPLAY
localhost:10.0

So the cookie isn’t copied and the DISPLAY isn’t updated.  I can’t see any 
errors in the logs and I can’t see anything different about this account.

If I do a straight forward ssh -Y from the head node to a compute node from the 
bad account then that works fine – it’s only whatever is specific about the way 
that srun forwards X which fails.

Any ideas or suggestions for debugging would be appreciated as I’m running out 
of things to try!

Simon.
The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered 
Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If 
you received this in error, please contact the sender and delete this email 
from your system. The contents of this e-mail are the views of the sender and 
do not necessarily represent the views of the Babraham Institute. Full 
conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms>
The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered 
Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If 
you received this in error, please contact the sender and delete this email 
from your system. The contents of this e-mail are the views of the sender and 
do not necessarily represent the views of the Babraham Institute. Full 
conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms>

Reply via email to