Re: [slurm-users] Slurm missing non primary group memberships

Janne Blomqvist Tue, 20 Nov 2018 03:15:42 -0800

On 10/11/2018 13.17, Douglas Jacobsen wrote:

We've had issues getting sssd to work reliably on compute nodes (atleast at scale), the reason is not fully understood, but basically ifthe connection times out with sssd it'll black list the server for 60s,which then causes those kinds of issues.

In our experience sssd doesn't work reliably in large environments ifuser/group enumeration is enabled (the "enumerate" config option).


slurm used to require enumeration, but in

https://github.com/SchedMD/slurm/commit/48a4cdf8d9433b5655a26581768200e7a696ce87

I reworked the logic so that it should only be required in some specialweird cases. But that patch was several years ago, hopefully whateverbugs were caused by it have been ironed out by now (*knocking on wood*).

Setting LaunchParameters=send_gids will sidestep this issue by doing thelookups exclusively on the controller node, where more frequentconnections can prevent time decay disconnections and reduce thelikelihood of cache misses.

This is probably good idea particularly if one has large parallel jobs,otherwise the nodes could DOS the AD/LDAP servers when launching if thecache is cold..



--
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto University School of Science, PHYS & NBE
+358503841576 || janne.blomqv...@aalto.fi

Re: [slurm-users] Slurm missing non primary group memberships

Reply via email to