That looks like the users' home directory doesn't exist on the node.
If you are not using a shared home for the nodes, your onboarding
process should be looked at to ensure it can handle any issues that may
arise.
If you are using a shared home, you should do the above and have the
node ensure the shared filesystems are mounted before allowing jobs.
-Brian Andrus
On 3/6/2023 1:15 AM, Niels Carl W. Hansen wrote:
Hi all
Seems there still are some issues with the autofs -
job_container/tmpfs functionality in Slurm 23.02.
If the required directories aren't mounted on the allocated node(s)
before jobstart, we get:
slurmstepd: error: couldn't chdir to `/users/lutest': No such file or
directory: going to /tmp instead
slurmstepd: error: couldn't chdir to `/users/lutest': No such file or
directory: going to /tmp instead
An easy workaround however, is to include this line in the slurm
prolog on the slurmd -nodes:
/usr/bin/su - $SLURM_JOB_USER -c /usr/bin/true
-but there might exist a better way to solve the problem?
Best
Niels Carl
On 3/2/23 12:27 AM, Jason Ellul wrote:
Thanks so much Ole for the info and link,
Your documentation is extremely useful.
Prior to moving to 22.05 we had been using slurm-spank-private-tmpdir
with an epilog to clean-up the folders on job completion, but we were
hoping to move to the inbuilt functionality to ensure future
compatibility and reduce complexity.
Will try 23.02 and if that does not resolve our issue consider moving
back to slurm-spank-private-tmpdir or auto_tmpdir.
Thanks again,
Jason
Jason Ellul
Head - Research Computing Facility
Office of Cancer Research
Peter MacCallum Cancer Center
*From: *slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf
of Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk>
*Date: *Wednesday, 1 March 2023 at 8:29 pm
*To: *slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
*Subject: *Re: [slurm-users] Cleanup of job_container/tmpfs
! EXTERNAL EMAIL: Think before you click. If suspicious send to
cyberrep...@petermac.org
Hi Jason,
IMHO, the job_container/tmpfs is not working well in Slurm 22.05, but
there may be some significant improvements included in 23.02 (announced
yesterday). I've documented our experiences in the Wiki page
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#temporary-job-directories
<https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#temporary-job-directories>
This page contains links to bug reports against the job_container/tmpfs
plugin.
We're using the auto_tmpdir SPANK plugin with great success in Slurm
22.05.
Best regards,
Ole
On 01-03-2023 03:27, Jason Ellul wrote:
> We have recently moved to slurm 22.05.8 and have configured
> job_container/tmpfs to allow private tmp folders.
>
> job_container.conf contains:
>
> AutoBasePath=true
>
> BasePath=/slurm
>
> And in slurm.conf we have set
>
> JobContainerType=job_container/tmpfs
>
> I can see the folders being created and they are being used but when a
> job completes the root folder is not being cleaned up.
>
> Example of running job:
>
> [root@papr-res-compute204 ~]# ls -al /slurm/14292874
>
> total 32
>
> drwx------ 3 root root 34 Mar 1 13:16 .
>
> drwxr-xr-x 518 root root 16384 Mar 1 13:16 ..
>
> drwx------ 2 mzethoven root 6 Mar 1 13:16 .14292874
>
> -r--r--r-- 1 root root 0 Mar 1 13:16 .ns
>
> Example once job completes /slurm/<jobid> remains:
>
> [root@papr-res-compute204 ~]# ls -al /slurm/14292794
>
> total 32
>
> drwx------ 2 root root 6 Mar 1 09:33 .
>
> drwxr-xr-x 518 root root 16384 Mar 1 13:16 ..
>
> Is this to be expected or should the folder /slurm/<jobid> also be
removed?
>
> Do I need to create an epilog script to remove the directory that
is left?
*Disclaimer: *This email (including any attachments or links) may
contain confidential and/or legally privileged information and is
intended only to be read or used by the addressee. If you are not the
intended addressee, any use, distribution, disclosure or copying of
this email is strictly prohibited. Confidentiality and legal
privilege attached to this email (including any attachments) are not
waived or lost by reason of its mistaken delivery to you. If you have
received this email in error, please delete it and notify us
immediately by telephone or email. Peter MacCallum Cancer Centre
provides no guarantee that this transmission is free of virus or that
it has not been intercepted or altered and will not be liable for any
delay in its receipt.