Damn,
I almost always forget, that most of the submission part is done on the
master :/
Best
Marcus
On 10/8/19 11:45 AM, Eddy Swan wrote:
Hi Sean,
Thank you so much for your additional information.
The issue is indeed due to missing user on the head node.
After i configured ldap client on s
Hi Sean,
Thank you so much for your additional information.
The issue is indeed due to missing user on the head node.
After i configured ldap client on slurm-master, srun command is now working
using ldap account.
Best regards,
Eddy Swan
On Tue, Oct 8, 2019 at 4:15 PM Sean Crosby wrote:
> Look
Looking at the SLURM code, it looks like it is failing with a call to
getpwuid_r on the ctld
What is (on slurm-master):
getent passwd turing
getent passwd 1000
Sean
--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
Research Platform Services | Business Services
CoEPP Research Compu
Hi Marcus,
I did not restarted munge previously.
So I restarted munge and follow by slurmd, but the issue still persists.
I ran the following test from piglet-17 to verify the munge installation,
it looks good.
$ munge -n | unmunge
STATUS: Success (0)
ENCODE_HOST: piglet-17.sg.cor
Hmm, that is strange. I asked because of the errors below:
On 10/7/19 9:36 AM, Eddy Swan wrote:
[2019-10-07T13:38:49.260] error: slurm_cred_create: getpwuid failed
for uid=1000
[2019-10-07T13:38:49.260] error: slurm_cred_create error
and "id" uses the same call (ltrace excerpt):
getpwuid(0x9
Hi Marcus,
pilget-17 as submit host:
$ id 1000
uid=1000(turing) gid=1000(turing)
groups=1000(turing),10(wheel),991(vboxusers)
piglet-18:
$ id 1000
uid=1000(turing) gid=1000(turing)
groups=1000(turing),10(wheel),992(vboxusers)
id 1000 is a local user for each node (piglet-17~19).
I also tried to
Hi Eddy,
what is the result of "id 1000" on the submithost and on piglet-18?
Best
Marcus
On 10/7/19 8:07 AM, Eddy Swan wrote:
Hi All,
I am currently testing slurm version 19.05.3-2 on Centos 7 with one
master and 3 nodes configuration.
I used the same configuration that works on version 17.0
Hi All,
I am currently testing slurm version 19.05.3-2 on Centos 7 with one master
and 3 nodes configuration.
I used the same configuration that works on version 17.02.7 but for some
reasons, it does not work on 19.05.3-2.
$ srun hostname
srun: error: Unable to create step for job 19: Error gener