That was exactly the bit I was missing. Thank you very much, Magnus!

Best
Niels Carl



On 3/7/23 3:13 PM, Hagdorn, Magnus Karl Moritz wrote:
I just upgrade slurm to 23.02 on our test cluster to try out the new
job_container/tmpfs stuff. I can confirm it works with autofs (hurrah!)
but you need to set the Shared=true option in the job_container.conf
file.
Cheers
magnus

On Tue, 2023-03-07 at 09:19 +0100, Ole Holm Nielsen wrote:
Hi Brian,

Presumably the users' home directory is NFS automounted using autofs,
and
therefore it doesn't exist when the job starts.

The job_container/tmpfs plugin ought to work correctly with autofs,
but
maybe this is still broken in 23.02?

/Ole


On 3/6/23 21:06, Brian Andrus wrote:
That looks like the users' home directory doesn't exist on the
node.

If you are not using a shared home for the nodes, your onboarding
process
should be looked at to ensure it can handle any issues that may
arise.

If you are using a shared home, you should do the above and have
the node
ensure the shared filesystems are mounted before allowing jobs.

-Brian Andrus

On 3/6/2023 1:15 AM, Niels Carl W. Hansen wrote:
Hi all

Seems there still are some issues with the autofs -
job_container/tmpfs
functionality in Slurm 23.02.
If the required directories aren't mounted on the allocated
node(s)
before jobstart, we get:

slurmstepd: error: couldn't chdir to `/users/lutest': No such
file or
directory: going to /tmp instead
slurmstepd: error: couldn't chdir to `/users/lutest': No such
file or
directory: going to /tmp instead

An easy workaround however, is to include this line in the slurm
prolog
on the slurmd -nodes:

/usr/bin/su - $SLURM_JOB_USER -c /usr/bin/true

-but there might exist a better way to solve the problem?


Reply via email to