Hi all, I am currently testing slurm (slurm-wlm 17.11.2 from a newly installed and updated Ubuntu server LTS). I managed to make it work on a very simple 1 master node and 2 compute nodes configuration. All three nodes have the same users (namely root, slurm and test), with slurm running both slurmctld and slurmd on the corresponding node (i.e. SlurmUser=slurm and SlurmdUser=slurm), and test as the only loggable user.
Commands such as `salloc` and `srun` work perfectly, but `sbatch` fails. In `squeue`, I get "(launch failed requeued help)". When I check the corresponding compute node log, I get "error: chown(/var/spool/slurmd/d/jobxxxxx): Operation not permitted". The previous line has "Launching batch job xx for UID 1000" (test) or 0 (root) if running `sudo sbatch`. Batch file looks like #! /bin/bash #SBATCH -J myjob hostname I suspect that the problem is that `srun` and `salloc` are being run by SlurmdUser (slurm, i.e. `srun whoami` returns slurm), who owns /var/spool/slurmd, but sbatch tasks are being run by the user issuing the command (test). Should I chmod /var/spool/slurmd so any user can write there, or do I have a configuration problem? I feel like I am missing something critical here. Thanks a lot. Daniel