I think i found the problem, but I dont have any idea how to solve.


scenario1: pbis was never installed (adopt succeed)


login server$ srun -c1 --pty bash


compA$ ssh slurm-node

slurm-node$ nproc

1


auth.log on slurm-node:


Mar  7 08:54:29 slurm-node pam_slurm_adopt[7329]: Connection by user temp: user has only one job 5
Mar  7 08:54:29 slurm-node pam_slurm_adopt[7329]: Process 7329 adopted into job 5
Mar  7 08:54:29 slurm-node sshd[7329]: Accepted password for temp from IP_HERE port 57052 ssh2
Mar  7 08:54:29 slurm-node sshd[7329]: pam_unix(sshd:session): session opened for user temp by (uid=0)


I bolded process number, it consistent.


sceanrio2: pbis installed (adopt failed)


login server$ srun -c1 --pty bash


compA$ ssh slurm-node

slurm-node$ nproc

2


*two is the total amount of cpus on slurm-node


auth.log on slurm-node:


Mar  7 09:00:52 slurm-node pam_slurm_adopt[1595]: Connection by user temp: user has only one job 8
Mar  7 09:00:53 slurm-node pam_slurm_adopt[1595]: Process 1595 adopted into job 8
Mar  7 09:00:53 slurm-node sshd[1593]: Accepted keyboard-interactive/pam for temp from IP_HERE port 33218 ssh2
Mar  7 09:00:53 slurm-node sshd[1593]: pam_unix(sshd:session): session opened for user temp by (uid=0)


here the process number changed! the adoption is for one process and eventually we successfully getting ssh access but with different process number and context.


ps -ef |grep 1595

no output


ps -ef |grep 1593

root      1593  1093  0 14:57 ?        00:00:00 sshd: temp [priv]
temp    1627  1593  0 14:57 ?        00:00:00 sshd: temp@pts/2


Notes:

sceanrio2 haven't changed when i tried:

a. stopping pbis service (lwsmd)

b. restore all pam.d files to scenario1 state

c. sudo apt purge pbis-open & reboot didnt help


My conclusion is that pbis changed something in the way linux pam works but i can't figure out where



If anyone got an idea, will be glad to hear.



On 2/24/19 9:22 AM, נדב טולדו wrote:
Thanks to both of you, I will try and let you know.

From: Prentice Bisbal
Sent: Fri, Feb 22, 2019 6:16 PM IST
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] pam_slurm_adopt with pbis-open pam modules


On 2/22/19 12:54 AM, Chris Samuel wrote:
On Thursday, 21 February 2019 8:20:36 AM PST נדב טולדו wrote:

Yeah I have, before i installed pbis and introduce lsass.so the slurm module
worked well Is there anyway to debug?

I am seeing in syslog that the slurm module is adopting into the job context
but then i am getting out of context somehow and have access to all
resources.
Yes, check the documentation and review your PAM configuration.  As I 
mentioned it sounds like you've got things in the wrong order there.

https://slurm.schedmd.com/pam_slurm_adopt.html#PAM_CONFIG

I second this. PAM is extremely sensitive to the module order by design.

Also, to debug, most PAM modules have a debug option you can use to enable the logging of debug messages. If you check the man pages for any pam modules, you'll see the debug options. For pam_slurm_adopt, see https://slurm.schedmd.com/pam_slurm_adopt.html. It looks like you can set a log_level setting:

log_level
See SlurmdDebug in slurm.conf for available options. The default log_level is info.


So to set the debugging level for pam_slurm_adopt, all the way up, you'd do something like this in your PAM file:

account sufficient pam_slurm_adopt.so debug=debug5

If you can't tell what's going on just from that, I would see how to enable debugging for all the PAM modules in the rest of the stack, to get a better picture of what's going on throughout the whole authentication process. When your done, don't forget to turn off logging so you don't fill your log files with unnecessary noise.


--

Prentice



Reply via email to