Hi Loris,
I know, it has been some time, but I have one additional remark.
If you just use ssh -X to login to the nodes, you will have a plain ssh
session, which means, none of SLURMs environment variables will be set.
So if your X11-Jobs are in need of that, you will have to use X11
forwarding through SLURM.
Best
Marcus
On 3/29/19 7:45 PM, Marcus Wagner wrote:
Hi Loris,
Am 29.03.2019 um 14:01 schrieb Loris Bennett:
Hi Marcus,
Marcus Wagner <wag...@itc.rwth-aachen.de> writes:
Hi Loris,
On 3/25/19 1:42 PM, Loris Bennett wrote:
3. salloc works fine too without --x11, subsequent srun with a x11
app works great
Doing 'salloc' followed by 'ssh -X' works for us too, which is
surprising to me.
This last option currently seems to me to be the best option for
users,
being slightly less confusing than logging into the login node again
from the login node, which is our current workaround.
Still, it's all a bit odd.
I assume, you use pam_slurm_adopt?
Yes.
Then it is clear, that this is working and has nothing to do with
the x11
forwarding feature of slurm. This is plain ssh X11-forwarding in
this case.
OK, I see that, but if I don't need --x11 with salloc, what is it
for? Just to control to control on which nodes forwarding is done
viz. --x11[=<all|first|last>]? What might be a use-case for not having
X11 forwarding for all the nodes, which is the default?
The default is (according to the manpage) 'batch', which means the
node, where the batchscript will be executed (the first of the
allocation, I think).
I do not know what first or last should be intended.
In fact I do not have a use case for x11-forwarding to all nodes,
might have to think a little bit more about that one.
Please keep in mind, that processes started with an adopted ssh
session are in
the jobs cgroup (good), but are accounted in the 'extern' step of
the job.
e.g.
* sbatch --wrap "sleep 10m"
* ssh to compute-node
* do some work in the compute node
after job is done
* sacct -j <jobid> -o JobID,JobName,MaxRSS,CPUTime,TotalCPU
JobID JobName MaxRSS CPUTime TotalCPU
------------ ---------- ---------- ---------- ----------
1053837 wrap 00:01:42 02:00.159
1053837.bat+ batch 412K 00:01:43 00:00.158
1053837.ext+ extern 543880K 00:01:42 02:00.001
That's interesting, although is there any advantage/difference compared
with just doing
srun --x11 --pty bash
?
With
srun --x11 --pty bash
the accounting will be in the batch step of the job, that is the only
difference I'm aware of at the moment.
With LSF we used that kind of mechanism to start e.g. vtune directly
out of the job. Without the X11-Forwarding feature of Slurm you would
have to salloc some hosts and then ssh to the nodes with x11
forwarding enabled to then start vtune.
So it is a little bit more to do for the user if you do not do
X11-Forwarding the SLURM style.
Best
Marcus
Cheers,
Loris
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de