What log level are you running? I'm also seeing random sssd halts but not 
seeing a dump. It looks like sssd recycles from systemd every 15 minutes. It 
usually is restarted in 0.1 seconds. But occasionally it doesn't restart until 
the next 15 minute cycle. That causes logins and cron jobs to fail. As soon as 
I turned the log level to 9, the system with the worst case count settled down 
and ran with no problems. Engineers paradox.

On July 21, 2022 2:41:42 AM EDT, lejeczek via FreeIPA-users 
<[email protected]> wrote:
>Hi guys.
>
>One of the masters started recently to find SSSD dead and 
>says the killer is the WATCHDOG - but I'm not sure about that.
> From sssd.log:
>...
>********************** BACKTRACE DUMP ENDS HERE 
>*********************************
>
>(2022-07-21  7:11:01): [sssd] [svc_child_info] (0x0020): 
>Child [991] ('pac':'pac') was terminated by own WATCHDOG
>    *  ... skipping repetitive backtrace ...
>(2022-07-21  7:11:14): [sssd] [svc_child_info] (0x0020): 
>Child [984] ('abba.xx.priv.yy':'%BE_abba.xx.priv.yy') was 
>terminated by own WATCHDOG
>    *  ... skipping repetitive backtrace ...
>(2022-07-21  7:11:14): [sssd] [svc_child_info] (0x0040): 
>Child [9744] ('nss':'nss') exited with code [3]
>********************** PREVIOUS MESSAGE WAS TRIGGERED BY THE 
>FOLLOWING BACKTRACE:
>    *  (2022-07-21  7:11:14): [sssd] 
>[sbus_dispatch_reconnect] (0x0400): Connection lost. 
>Terminating active requests.
>    *  (2022-07-21  7:11:14): [sssd] 
>[sbus_dispatch_reconnect] (0x4000): Remote client terminated 
>the connection. Releasing data...
>    *  (2022-07-21  7:11:14): [sssd] [sbus_connection_free] 
>(0x4000): Connection 0x5576314d9180 will be freed during 
>next loop!
>    *  (2022-07-21  7:11:14): [sssd] [mt_svc_restart] 
>(0x0400): Scheduling service abba.xx.priv.yy for restart 1
>    *  (2022-07-21  7:11:14): [sssd] [get_provider_config] 
>(0x0100): Formed command '/usr/libexec/sssd/sssd_be --domain 
>abba.xx.priv.yy --uid 0 --gid 0 --logger=files' for provider 
>'%BE_abba.xx.priv.yy'
>    *  (2022-07-21  7:11:14): [sssd] [start_service] 
>(0x0100): Queueing service abba.xx.priv.yy for startup
>    *  (2022-07-21  7:11:14): [sssd] [mt_svc_exit_handler] 
>(0x1000): SIGCHLD handler of service nss called
>    *  (2022-07-21  7:11:14): [sssd] [svc_child_info] 
>(0x0040): Child [9744] ('nss':'nss') exited with code [3]
>********************** BACKTRACE DUMP ENDS HERE 
>*********************************
>
>(2022-07-21  7:11:14): [sssd] [svc_child_info] (0x0040): 
>Child [9758] ('pac':'pac') exited with code [3]
>    *  ... skipping repetitive backtrace ...
>(2022-07-21  7:11:16): [sssd] [svc_child_info] (0x0040): 
>Child [9876] ('nss':'nss') exited with code [3]
>    *  ... skipping repetitive backtrace ...
>(2022-07-21  7:11:16): [sssd] [svc_child_info] (0x0040): 
>Child [9877] ('pac':'pac') exited with code [3]
>    *  ... skipping repetitive backtrace ...
>(2022-07-21  7:11:20): [sssd] [svc_child_info] (0x0040): 
>Child [9903] ('nss':'nss') exited with code [3]
>    *  ... skipping repetitive backtrace ...
>(2022-07-21  7:11:20): [sssd] [monitor_restart_service] 
>(0x0010): Process [nss], definitely stopped!
>(2022-07-21  7:11:20): [sssd] [monitor_quit] (0x3f7c0): 
>Returned with: 1
>(2022-07-21  7:11:20): [sssd] [monitor_quit] (0x3f7c0): 
>Terminating [pac][9904]
>(2022-07-21  7:11:21): [sssd] [monitor_quit] (0x3f7c0): 
>Child [pac] terminated with a signal
>(2022-07-21  7:11:21): [sssd] [monitor_quit] (0x3f7c0): 
>Terminating [abba.xx.priv.yy][9875]
>(2022-07-21  7:11:21): [sssd] [monitor_quit] (0x3f7c0): 
>Child [abba.xx.priv.yy] exited gracefully
>(2022-07-21  7:11:21): [sssd] [monitor_quit] (0x3f7c0): 
>Terminating [sudo][990]
>(2022-07-21  7:11:21): [sssd] [monitor_quit] (0x3f7c0): 
>Child [sudo] exited gracefully
>(2022-07-21  7:11:21): [sssd] [monitor_quit] (0x3f7c0): 
>Terminating [ssh][989]
>(2022-07-21  7:11:21): [sssd] [monitor_quit] (0x3f7c0): 
>Child [ssh] exited gracefully
>(2022-07-21  7:11:21): [sssd] [monitor_quit] (0x3f7c0): 
>Terminating [ifp][988]
>(2022-07-21  7:11:21): [sssd] [monitor_quit] (0x3f7c0): 
>Child [ifp] exited gracefully
>(2022-07-21  7:11:21): [sssd] [monitor_quit] (0x3f7c0): 
>Terminating [pam][987]
>(2022-07-21  7:11:21): [sssd] [monitor_quit] (0x3f7c0): 
>Child [pam] exited gracefully
>(2022-07-21  7:11:21): [sssd] [monitor_quit] (0x3f7c0): 
>Terminating [implicit_files][983]
>(2022-07-21  7:11:21): [sssd] [monitor_quit] (0x3f7c0): 
>Child [implicit_files] exited gracefully
>
>This "death" happens randomly, well, to me at least. Can be 
>just after reboot or several hours of uptime.
>There is more in log files from /var/log/sssd but before I 
>clutter emails with more logs snippets I was hoping some 
>expert can share some thoughts.
>
>many thanks, L.
>_______________________________________________
>FreeIPA-users mailing list -- [email protected]
>To unsubscribe send an email to
>[email protected]
>Fedora Code of Conduct:
>https://docs.fedoraproject.org/en-US/project/code-of-conduct/
>List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
>List Archives:
>https://lists.fedorahosted.org/archives/list/[email protected]
>Do not reply to spam on the list, report it:
>https://pagure.io/fedora-infrastructure

-- 
Computers amplify human error
Super computers are really cool
_______________________________________________
FreeIPA-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/[email protected]
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure

Reply via email to