Re: [slurm-users] srun: Error generating job credential

2019-10-08 Thread Marcus Wagner
Damn, I almost always forget, that most of the submission part is done on the master :/ Best Marcus On 10/8/19 11:45 AM, Eddy Swan wrote: Hi Sean, Thank you so much for your additional information. The issue is indeed due to missing user on the head node. After i configured ldap client on s

Re: [slurm-users] srun: Error generating job credential

2019-10-08 Thread Eddy Swan
Hi Sean, Thank you so much for your additional information. The issue is indeed due to missing user on the head node. After i configured ldap client on slurm-master, srun command is now working using ldap account. Best regards, Eddy Swan On Tue, Oct 8, 2019 at 4:15 PM Sean Crosby wrote: > Look

Re: [slurm-users] srun: Error generating job credential

2019-10-08 Thread Sean Crosby
Looking at the SLURM code, it looks like it is failing with a call to getpwuid_r on the ctld What is (on slurm-master): getent passwd turing getent passwd 1000 Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Platform Services | Business Services CoEPP Research Compu

Re: [slurm-users] srun: Error generating job credential

2019-10-07 Thread Eddy Swan
Hi Marcus, I did not restarted munge previously. So I restarted munge and follow by slurmd, but the issue still persists. I ran the following test from piglet-17 to verify the munge installation, it looks good. $ munge -n | unmunge STATUS: Success (0) ENCODE_HOST: piglet-17.sg.cor

Re: [slurm-users] srun: Error generating job credential

2019-10-07 Thread Marcus Wagner
Hmm, that is strange. I asked because of the errors below: On 10/7/19 9:36 AM, Eddy Swan wrote: [2019-10-07T13:38:49.260] error: slurm_cred_create: getpwuid failed for uid=1000 [2019-10-07T13:38:49.260] error: slurm_cred_create error and "id" uses the same call (ltrace excerpt): getpwuid(0x9

Re: [slurm-users] srun: Error generating job credential

2019-10-07 Thread Eddy Swan
Hi Marcus, pilget-17 as submit host: $ id 1000 uid=1000(turing) gid=1000(turing) groups=1000(turing),10(wheel),991(vboxusers) piglet-18: $ id 1000 uid=1000(turing) gid=1000(turing) groups=1000(turing),10(wheel),992(vboxusers) id 1000 is a local user for each node (piglet-17~19). I also tried to

Re: [slurm-users] srun: Error generating job credential

2019-10-06 Thread Marcus Wagner
Hi Eddy, what is the result of "id 1000" on the submithost and on piglet-18? Best Marcus On 10/7/19 8:07 AM, Eddy Swan wrote: Hi All, I am currently testing slurm version 19.05.3-2 on Centos 7 with one master and 3 nodes configuration. I used the same configuration that works on version 17.0

[slurm-users] srun: Error generating job credential

2019-10-06 Thread Eddy Swan
Hi All, I am currently testing slurm version 19.05.3-2 on Centos 7 with one master and 3 nodes configuration. I used the same configuration that works on version 17.02.7 but for some reasons, it does not work on 19.05.3-2. $ srun hostname srun: error: Unable to create step for job 19: Error gener