I think i found the problem, but I dont have any idea how to
solve.
scenario1: pbis was never installed (adopt succeed)
login server$ srun -c1 --pty bash
compA$ ssh slurm-node
slurm-node$ nproc
1
auth.log on slurm-node:
Mar 7 08:54:29 slurm-node pam_slurm_adopt[7329]:
Connection by user temp: user has only one job 5
Mar 7 08:54:29 slurm-node pam_slurm_adopt[7329]: Process
7329 adopted into job 5
Mar 7 08:54:29 slurm-node sshd[7329]: Accepted password
for temp from IP_HERE port 57052 ssh2
Mar 7 08:54:29 slurm-node sshd[7329]:
pam_unix(sshd:session): session opened for user temp by (uid=0)
I bolded process number, it consistent.
sceanrio2: pbis installed (adopt failed)
login server$ srun -c1 --pty bash
compA$ ssh slurm-node
slurm-node$ nproc
2
*two is the total amount of cpus on slurm-node
auth.log on slurm-node:
Mar 7 09:00:52 slurm-node pam_slurm_adopt[1595]:
Connection by user temp: user has only one job 8
Mar 7 09:00:53 slurm-node pam_slurm_adopt[1595]: Process
1595 adopted into job 8
Mar 7 09:00:53 slurm-node sshd[1593]: Accepted
keyboard-interactive/pam for temp from IP_HERE port 33218 ssh2
Mar 7 09:00:53 slurm-node sshd[1593]:
pam_unix(sshd:session): session opened for user temp by (uid=0)
here the process number changed! the adoption is for one process
and eventually we successfully getting ssh access but with
different process number and context.
ps -ef |grep 1595
no output
ps -ef |grep 1593
root 1593 1093 0 14:57 ? 00:00:00 sshd: temp [priv]
temp 1627 1593 0 14:57 ? 00:00:00 sshd: temp@pts/2
Notes:
sceanrio2 haven't changed when i tried:
a. stopping pbis service (lwsmd)
b. restore all pam.d files to scenario1 state
c. sudo apt purge pbis-open & reboot didnt help
My conclusion is that pbis changed something in the way linux pam
works but i can't figure out where
If anyone got an idea, will be glad to hear.
On 2/24/19 9:22 AM, נדב טולדו wrote:
Thanks to both of you, I will try and let you know.
From: Prentice Bisbal
Sent:
Fri, Feb 22, 2019 6:16 PM IST
To: slurm-users@lists.schedmd.com
Subject:
[slurm-users] pam_slurm_adopt with pbis-open pam modules
On 2/22/19 12:54 AM, Chris Samuel
wrote:
On Thursday, 21 February 2019 8:20:36 AM PST נדב טולדו wrote:
Yeah I have, before i installed pbis and introduce lsass.so the slurm module
worked well Is there anyway to debug?
I am seeing in syslog that the slurm module is adopting into the job context
but then i am getting out of context somehow and have access to all
resources.
Yes, check the documentation and review your PAM configuration. As I
mentioned it sounds like you've got things in the wrong order there.
https://slurm.schedmd.com/pam_slurm_adopt.html#PAM_CONFIG
I second this. PAM is extremely sensitive to the module order
by design.
Also, to debug, most PAM modules have a debug option you can
use to enable the logging of debug messages. If you check the
man pages for any pam modules, you'll see the debug options.
For pam_slurm_adopt, see https://slurm.schedmd.com/pam_slurm_adopt.html.
It looks like you can set a log_level setting:
- log_level
- See SlurmdDebug in slurm.conf
for available options. The default log_level is info.
So to set the debugging level for pam_slurm_adopt, all the
way up, you'd do something like this in your PAM file:
account sufficient pam_slurm_adopt.so debug=debug5
If you can't tell what's going on just from that, I would see
how to enable debugging for all the PAM modules in the rest of
the stack, to get a better picture of what's going on
throughout the whole authentication process. When your done,
don't forget to turn off logging so you don't fill your log
files with unnecessary noise.
--
Prentice