This is not caused by systemctl daemon-reexec. When the systemd package
is upgraded (apparently happening during apt-daily-upgrade.service),
some services will be restarted (see
/var/lib/dpkg/info/systemd.postinst). This includes systemd-
networkd.service.

And, systemd-networkd does not generally maintain link configuration
across restarts. You can try using KeepConfiguration[1] in your network
config, but note that does not cover all configuration options.

[1]
https://www.freedesktop.org/software/systemd/man/255/systemd.network.html#KeepConfiguration=

** Changed in: systemd (Ubuntu)
       Status: New => Incomplete

** Changed in: systemd (Ubuntu)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2099676

Title:
  Network connectivity loss after systemctl daemon-reexec

Status in systemd package in Ubuntu:
  Incomplete

Bug description:
  # Our problem

  We are running multiple K8S clusters on Ubuntu 24.04.1 LTS nodes.

  On one of these clusters, we have noticed at least twice that most of the 
nodes (~5 out of 8) went offline without any action on our side.
  To restore connectivity, we tried ifdown/ifup, disconnect/connect network 
from hypervisor and networking service restart but nothing helped, we had to 
reboot the nodes from the console.

  After some investigations, we were able to correlate this outage with the 
`apt-daily-upgrade` service run triggered by the `apt-daily-upgrade` timer.
  Somehow, the `apt-daily-upgrade` service updated a package which triggered a 
`systemctl daemon-reexec`, cuting network connectivity in the process.

  # Symptoms

  Node is flagged as `NotReady` by K8s
  SSH connection to node is not working
  From the node, we can't ping the gateway
  The output of `systemctl daemon-reexec` in `journalctl` is way more verbose 
than usual :

  ```
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Reexecuting requested from 
client PID 2711048 ('systemctl') (unit apt-daily-upgrade.service)...
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Reexecuting.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: systemd 255.4-1ubuntu8.5 
running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP 
+GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC 
+KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT +
  QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP 
+SYSVINIT default-hierarchy=unified)
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Detected virtualization vmware.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Detected architecture x86-64.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Starting man-db.service - Daily 
man-db regeneration...
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Stopping containerd.service - 
containerd container runtime...
  févr. 21 06:06:55 lylux0634kdp004 ntpd[1106]: ERR: ntpd exiting on signal 15 
(Terminated)
  févr. 21 06:06:55 lylux0634kdp004 ntpd[1106]: PROTO: 172.16.10.254 unlink 
local addr 172.16.34.4 -> <null>
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Stopping ntpsec.service - 
Network Time Service...
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Stopping open-vm-tools.service 
- Service for virtual machines hosted on VMware...
  févr. 21 06:06:55 lylux0634kdp004 systemd-journald[504]: Journal stopped
  févr. 21 06:06:55 lylux0634kdp004 systemd-journald[504]: Received SIGTERM 
from PID 1 (systemd).
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Stopping 
systemd-journald.service - Journal Service...
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: ntpsec.service: Deactivated 
successfully.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Stopped ntpsec.service - 
Network Time Service.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: ntpsec.service: Consumed 1min 
12.819s CPU time, 12.4M memory peak, 0B memory swap peak.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Deactivated 
successfully.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Unit 
process 3374 (containerd-shim) remains running after unit stopped.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Unit 
process 3375 (containerd-shim) remains running after unit stopped.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Unit 
process 3475 (containerd-shim) remains running after unit stopped.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Unit 
process 3512 (containerd-shim) remains running after unit stopped.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Unit 
process 3545 (containerd-shim) remains running after unit stopped.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Unit 
process 3618 (containerd-shim) remains running after unit stopped.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Unit 
process 2574706 (containerd-shim) remains running after unit stopped.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Stopped containerd.service - 
containerd container runtime.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Consumed 
9min 54.298s CPU time, 3.4G memory peak, 0B memory swap peak.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Found 
left-over process 3374 (containerd-shim) in control group while starting unit. 
Ignoring.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: This 
usually indicates unclean termination of a previous run, or service 
implementation deficiencies.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Found 
left-over process 3375 (containerd-shim) in control group while starting unit. 
Ignoring.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: This 
usually indicates unclean termination of a previous run, or service 
implementation deficiencies.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Found 
left-over process 3475 (containerd-shim) in control group while starting unit. 
Ignoring.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: This 
usually indicates unclean termination of a previous run, or service 
implementation deficiencies.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Found 
left-over process 3512 (containerd-shim) in control group while starting unit. 
Ignoring.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: This 
usually indicates unclean termination of a previous run, or service 
implementation deficiencies.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Found 
left-over process 3545 (containerd-shim) in control group while starting unit. 
Ignoring.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: This 
usually indicates unclean termination of a previous run, or service 
implementation deficiencies.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Found 
left-over process 3618 (containerd-shim) in control group while starting unit. 
Ignoring.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: This 
usually indicates unclean termination of a previous run, or service 
implementation deficiencies.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Found 
left-over process 2574706 (containerd-shim) in control group while starting 
unit. Ignoring.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: This 
usually indicates unclean termination of a previous run, or service 
implementation deficiencies.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Starting containerd.service - 
containerd container runtime...
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: netplan-ovs-cleanup.service - 
OpenVSwitch configuration for cleanup was skipped because of an unmet condition 
check (ConditionFileIsExecutable=/usr/bin/ovs-vsctl).
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Starting ntpsec.service - 
Network Time Service...
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: 
systemd-networkd-wait-online.service: Deactivated successfully.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Stopped 
systemd-networkd-wait-online.service - Wait for Network to be Configured.
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Stopping 
systemd-networkd-wait-online.service - Wait for Network to be Configured...
  févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Stopping 
systemd-networkd.service - Network Configuration...
  ```

  The `Found left-over process` lines made me think of bug
  https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2013543 but
  from my understanding, we should not be impacted on Noble hosts.

  # Testcase

  Here is the catch : we can't reproduce the issue on-demand.

  When manually running `systemctl daemon-reexec`, we are not
  experiencing the same outage and journalctl is only logging 5 lines :

  ```
  févr. 21 11:01:06 lylux0634kdp004 systemd[1]: Reexecuting requested from 
client PID 23296 ('systemctl') (unit session-2.scope)...
  févr. 21 11:01:06 lylux0634kdp004 systemd[1]: Reexecuting.
  févr. 21 11:01:06 lylux0634kdp004 systemd[1]: systemd 255.4-1ubuntu8.5 
running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP 
+GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC 
+KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT >
  févr. 21 11:01:06 lylux0634kdp004 systemd[1]: Detected virtualization vmware.
  févr. 21 11:01:06 lylux0634kdp004 systemd[1]: Detected architecture x86-64.
  ```

  # Some aditional details

  root@lylux0634kdp004:~# lsb_release -d
  No LSB modules are available.
  Description:    Ubuntu 24.04.1 LTS
  root@lylux0634kdp004:~# apt-cache policy systemd
  systemd:
    Installé : 255.4-1ubuntu8.5
    Candidat : 255.4-1ubuntu8.5
   Table de version :
   *** 255.4-1ubuntu8.5 500
          500 https://XXXXXX/ubuntu-fr noble-updates/main amd64 Packages
          100 /var/lib/dpkg/status
       255.4-1ubuntu8 500
          500 https://XXXXX/ubuntu-fr noble/main amd64 Packages
  root@lylux0634kdp004:~# uname -a
  Linux lylux0634kdp004 6.8.0-52-generic #53-Ubuntu SMP PREEMPT_DYNAMIC Sat Jan 
11 00:06:25 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

  Feel free to request any aditional details that would be of any help
  in the troubleshooting of this issue.

  Antoine

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2099676/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to