Hi, I need assistance in understanding what kind of processes happen behind the work of 389ds server in my case, to come with better approaches in troubleshooting further problems.
In our environment we are running FreeIPA instances in Docker, but neither the case that it is FreeIPA nor the Docker part actually matter that much, cause ns-slapd service is the main actor of a "problem" I faced. Prerequisites ------------- Host has 4 CPU, 8gb RAM FreeIPA v.4.9.8 ns-slapd -v 389 Project 389-Directory/1.4.3.28 B2022.074.1358 ------------- Most of the time the host does not expirience any kind of load, though RAM usage (talking about used) can grow linery over time but it makes no odds. After some moment (now I can say with some certainty that's because if buff/cache grows) the host begin to expirience heavy disk write load, which I notice by CPU IOWait alerts. cpu iowait https://imgur.com/ifXPX5N disk IO and latency https://imgur.com/HyT0cNR The increase in load is not due to external causes (increase of ldap queries). Actually, "page out" operations were increasing. https://imgur.com/OxofSD9 and also process exporter showed increased major page faults which served as indicator of memory problems (memory shortage though? there was plenty of RAM) https://imgur.com/xt2p933 Memory status during the problem -------------------------------- Tasks: 175 total, 1 running, 174 sleeping, 0 stopped, 0 zombie %CPU(s): 0.7 us, 0.7 sy, 0.0 ni, 29.1 id, 69.4 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 7957.3 total, 414.3 free, 2332.7 used, 5210.3 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 5300.7 avail Mem I tried to use some other instruments that would indicate that something was wrong, but I didn't succeed, but I will still add them here for full picture Here output of https://github.com/iovisor/bcc/blob/master/tools/ext4slower_example.txt shows that fsync syscalls by ns-slapd taking to long TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME 07:00:02 ns-slapd 4017 S 0 0 1455.28 log.0000006661 07:00:10 ns-slapd 4017 S 0 0 1133.62 log.0000006661 07:00:11 ns-slapd 4017 S 0 0 1248.36 log.0000006661 07:01:02 ns-slapd 4017 S 0 0 736.46 log.0000006661 07:01:05 ns-slapd 4017 S 0 0 1541.29 access 07:01:07 ns-slapd 4017 S 0 0 1220.32 log.0000006661 07:01:08 ns-slapd 4017 S 0 0 832.87 log.0000006661 07:01:08 ns-slapd 4017 S 0 0 93.07 entryusn.db 07:01:09 ns-slapd 4017 S 0 0 418.88 description.db 07:01:10 ns-slapd 4017 S 0 0 822.64 nsuniqueid.db 07:01:10 ns-slapd 4017 S 0 0 417.25 displayname.db 07:01:10 ns-slapd 4017 S 0 0 1493.60 log.0000006661 07:01:11 ns-slapd 4017 S 0 0 617.68 memberHost.db 07:01:12 ns-slapd 4017 S 0 0 1684.12 log.0000006661 07:01:12 ns-slapd 4017 S 0 0 434.58 parentid.db 07:01:12 ns-slapd 4017 S 0 0 601.09 accessRuleType.db 07:01:13 ns-slapd 4017 S 0 0 1051.10 memberUser.db 07:01:14 ns-slapd 4017 S 0 0 1056.28 ipakrbprincipalalias.db 07:01:14 ns-slapd 4017 S 0 0 1803.00 log.0000006661 And another output of https://github.com/iovisor/bcc/blob/master/tools/cachetop_example.txt I honestly not sure how to interpret those results, maybe it would be helpful 07:15:30 Buffers MB: 97 / Cached MB: 4212 / Sort: HITS / Order: descending PID UID CMD HITS MISSES DIRTIES READ_HIT% WRITE_HIT% 4017 UNKNOWN( ns-slapd 7259 7177 7177 0.6% 0.0% So, in the attempts to figure out the solution I tried to restart FreeIPA container but it had no effect. In the end, I stopped container, dropped all memory caches (sync; echo 3 > /proc/sys/vm/drop_caches) and started container, and only after that excessive paging out (and disk IO) was gone. So my questions are: 1) What processes behind the scenes of 389ds server could cause such behavior? 2) Is it (kind of) normal? 3) What actions would you suggest to mitigate such behavior. Would ldap metrics in cn=monitor helped to determing something was wrong? (unfortunately they are not currently monitored, but will be). 4) Generally, what would you suggest to monitor additionally, maybe some metrics that would be of help, but not very obvious. _______________________________________________ FreeIPA-users mailing list -- [email protected] To unsubscribe send an email to [email protected] Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/[email protected] Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
