Maybe this was a noob question, I've just solved my problem.
I'll share my thoughts. I returned to my original settings
and rerun Ansible's playbook, reconfiguring the SlurmdSpoolDir.

* https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmdSpoolDir_1
  Maybe it is writable by root, because root can write everywhere
  (at least except for the Kerberos'ed NFS), but my settings give
  he user and group permissions only to the "slurm" user:

  - name: "create SlurmdSpoolDir directory"
    ansible.builtin.file:
      path: "{{ slurmd_spool_dir }}"
      state: "directory"
      owner: "{{ slurm_user }}"
      group: "{{ slurm_user }}"
      mode: "0770"

* Setting permissions for the SlurmSpoolDir is not really important,
  because at each "slurmd" reboot those permissions are reset
  by "slurmd" to "0755". The ownership is not changed. So, as a result:

  drwxr-xr-x 2 slurm slurm 74 Aug 16 20:57 slurmd_spool

* The missing part was the read permissions for "other" for the
  SlurmSpoolDir's parent directory. I had to set "775" instead
  of "770" for the parent dir, which in my case is
  "/opt/slurm_state_dir"

  drwxrwxr-x  3 slurm   slurm   26 Aug 11 19:49 slurm_state_dir

Kind regards
--
Kamil Wilczek


W dniu 16.08.2022 o 18:00, Kamil Wilczek pisze:
Dear Slurm Users,

recently, I have started a new instance of my cluster with Slurm 22.05.2
(built from source). Evertyhing seems to be configured properly and
working fine except "sbatch". The error is quite self-explanatory and
I thought it would be quite easy to fix directory permissions.

slurmstepd: error: execve(): /opt/slurm_state_dir/slurmd_spool/job00136/slurm_script: Permission denied

I read here (https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmdSpoolDir_1) that the directory should
be writable by root. I did that, but it did not help. I tried
several other combinations of permissions, no improvement.

Currently:

# ls -l /opt/
drwxrwx---  3 slurm  root   26 Aug 11 19:49 slurm_state_dir

# tree -pug /opt/slurm_state_dir
/opt/slurm_state_dir
└── [drwxrwx--- root     slurm   ]  slurmd_spool
     ├── [-rw------- root     root    ]  cred_state
     ├── [-rw------- root     root    ]  cred_state.old
     └── [-rw-r--r-- root     root    ]  hwloc_topo_whole.xml

Additionaly, when I change the mode of the slurmd_spool directory,
for example to 770, restarting the slurmd service changes them
back to 755 irrespectively of user/group.

Could somoeone tell what the correct settings should be?
I did not have such problems in using 19.05.

Kind Regards

Reply via email to