Maybe this was a noob question, I've just solved my problem.
I'll share my thoughts. I returned to my original settings
and rerun Ansible's playbook, reconfiguring the SlurmdSpoolDir.
* https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmdSpoolDir_1
Maybe it is writable by root, because root can write everywhere
(at least except for the Kerberos'ed NFS), but my settings give
he user and group permissions only to the "slurm" user:
- name: "create SlurmdSpoolDir directory"
ansible.builtin.file:
path: "{{ slurmd_spool_dir }}"
state: "directory"
owner: "{{ slurm_user }}"
group: "{{ slurm_user }}"
mode: "0770"
* Setting permissions for the SlurmSpoolDir is not really important,
because at each "slurmd" reboot those permissions are reset
by "slurmd" to "0755". The ownership is not changed. So, as a result:
drwxr-xr-x 2 slurm slurm 74 Aug 16 20:57 slurmd_spool
* The missing part was the read permissions for "other" for the
SlurmSpoolDir's parent directory. I had to set "775" instead
of "770" for the parent dir, which in my case is
"/opt/slurm_state_dir"
drwxrwxr-x 3 slurm slurm 26 Aug 11 19:49 slurm_state_dir
Kind regards
--
Kamil Wilczek
W dniu 16.08.2022 o 18:00, Kamil Wilczek pisze:
Dear Slurm Users,
recently, I have started a new instance of my cluster with Slurm 22.05.2
(built from source). Evertyhing seems to be configured properly and
working fine except "sbatch". The error is quite self-explanatory and
I thought it would be quite easy to fix directory permissions.
slurmstepd: error: execve():
/opt/slurm_state_dir/slurmd_spool/job00136/slurm_script: Permission denied
I read here
(https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmdSpoolDir_1) that
the directory should
be writable by root. I did that, but it did not help. I tried
several other combinations of permissions, no improvement.
Currently:
# ls -l /opt/
drwxrwx--- 3 slurm root 26 Aug 11 19:49 slurm_state_dir
# tree -pug /opt/slurm_state_dir
/opt/slurm_state_dir
└── [drwxrwx--- root slurm ] slurmd_spool
├── [-rw------- root root ] cred_state
├── [-rw------- root root ] cred_state.old
└── [-rw-r--r-- root root ] hwloc_topo_whole.xml
Additionaly, when I change the mode of the slurmd_spool directory,
for example to 770, restarting the slurmd service changes them
back to 755 irrespectively of user/group.
Could somoeone tell what the correct settings should be?
I did not have such problems in using 19.05.
Kind Regards