[Touch-packages] [Bug 2053288] [NEW] systemd-networkd IPv6 default routes dropped under load, don't recover

Bruce Duncan Thu, 15 Feb 2024 15:30:37 -0800

Public bug reported:

Ubuntu 22.04.3 LTS
systemd 249.11-0ubuntu3.12


systemd issue tracker says this version is too old to report upstream
and I should report to downstream bug tracker.

IPv6 default routes are getting lost and not renewed.

We're using IPv6 RA to find default routes for our servers and desktops.
The RAs come from HP/Aruba routers and have a short lifetime of about
46s. Occasionally, we will see the default routes get dropped. Despite
receiving RAs, the default routes don't get recreated.

The most recent machine to be affected had a user running an excessively
large job (load average 157). This is the state of the network when the
machine is working:

```sh
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group 
default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 
state UP group default qlen 1000
    link/ether 2c:ea:7f:56:9a:66 brd ff:ff:ff:ff:ff:ff
    altname enp4s0f0
3: eno2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 
state UP group default qlen 1000
    link/ether 2c:ea:7f:56:9a:66 brd ff:ff:ff:ff:ff:ff permaddr 
2c:ea:7f:56:9a:67
    altname enp4s0f1
4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state 
UP group default qlen 1000
    link/ether 2c:ea:7f:56:9a:66 brd ff:ff:ff:ff:ff:ff
    inet xxx.xxx.202.112/24 brd 129.215.202.255 scope global bond0
       valid_lft forever preferred_lft forever
    inet6 xxxx:xxx:xxx:202:2eea:7fff:fe56:9a66/64 scope global dynamic 
mngtmpaddr noprefixroute 
       valid_lft 2591994sec preferred_lft 604794sec
    inet6 fe80::2eea:7fff:fe56:9a66/64 scope link 
       valid_lft forever preferred_lft forever
# ip -6 r
::1 dev lo proto kernel metric 256 pref medium
xxxx:xxx:xxx:202::/64 dev bond0 proto ra metric 1024 expires 2591998sec pref 
medium
fe80::/64 dev bond0 proto kernel metric 256 pref medium
default proto ra metric 1024 expires 28sec pref medium
        nexthop via fe80::609:73ff:fe48:c000 dev bond0 weight 1 
        nexthop via fe80::609:73ff:fe48:6500 dev bond0 weight 1 
```

When the problem arises, the last three lines disappear. `tcpdump icmp6`
shows RAs being received but networkd doesn't create the routes in the
kernel. The machine keeps its IPv6 addresses, but without a default
route it can't make any IPv6 connections or answer incoming IPv6
connections.

Sorry, reproduction method is unclear. Here's a best guess:

1. Configure networkd using netplan:

```yaml
---
network:
  bonds:
    bond0:
      addresses:
      - xxx.xxx.202.112/24
      dhcp4: false
      interfaces:
      - eth0
      - eth1
      macaddress: 2C:EA:7F:56:9A:66
      parameters:
        mii-monitor-interval: 1
        mode: active-backup
  ethernets:
    eth0:
      dhcp4: false
      match:
        macaddress: 2C:EA:7F:56:9A:66
    eth1:
      dhcp4: false
      match:
        macaddress: 2C:EA:7F:56:9A:67
  renderer: networkd
  version: 2
```

2. Load the machine, or just wait. Possibly this is related to packets being 
dropped, but I would expect the system to recover once the load is removed.
3. Note the lack of IPv6 connectivity, inability to log in with ssh, etc.

** Affects: systemd (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2053288

Title:
  systemd-networkd IPv6 default routes dropped under load, don't recover

Status in systemd package in Ubuntu:
  New

Bug description:
  Ubuntu 22.04.3 LTS
  systemd 249.11-0ubuntu3.12

  systemd issue tracker says this version is too old to report upstream
  and I should report to downstream bug tracker.

  IPv6 default routes are getting lost and not renewed.

  We're using IPv6 RA to find default routes for our servers and
  desktops. The RAs come from HP/Aruba routers and have a short lifetime
  of about 46s. Occasionally, we will see the default routes get
  dropped. Despite receiving RAs, the default routes don't get
  recreated.

  The most recent machine to be affected had a user running an
  excessively large job (load average 157). This is the state of the
  network when the machine is working:

  ```sh
  # ip a
  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group 
default qlen 1000
      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      inet 127.0.0.1/8 scope host lo
         valid_lft forever preferred_lft forever
      inet6 ::1/128 scope host 
         valid_lft forever preferred_lft forever
  2: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master 
bond0 state UP group default qlen 1000
      link/ether 2c:ea:7f:56:9a:66 brd ff:ff:ff:ff:ff:ff
      altname enp4s0f0
  3: eno2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master 
bond0 state UP group default qlen 1000
      link/ether 2c:ea:7f:56:9a:66 brd ff:ff:ff:ff:ff:ff permaddr 
2c:ea:7f:56:9a:67
      altname enp4s0f1
  4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue 
state UP group default qlen 1000
      link/ether 2c:ea:7f:56:9a:66 brd ff:ff:ff:ff:ff:ff
      inet xxx.xxx.202.112/24 brd 129.215.202.255 scope global bond0
         valid_lft forever preferred_lft forever
      inet6 xxxx:xxx:xxx:202:2eea:7fff:fe56:9a66/64 scope global dynamic 
mngtmpaddr noprefixroute 
         valid_lft 2591994sec preferred_lft 604794sec
      inet6 fe80::2eea:7fff:fe56:9a66/64 scope link 
         valid_lft forever preferred_lft forever
  # ip -6 r
  ::1 dev lo proto kernel metric 256 pref medium
  xxxx:xxx:xxx:202::/64 dev bond0 proto ra metric 1024 expires 2591998sec pref 
medium
  fe80::/64 dev bond0 proto kernel metric 256 pref medium
  default proto ra metric 1024 expires 28sec pref medium
        nexthop via fe80::609:73ff:fe48:c000 dev bond0 weight 1 
        nexthop via fe80::609:73ff:fe48:6500 dev bond0 weight 1 
  ```

  When the problem arises, the last three lines disappear. `tcpdump
  icmp6` shows RAs being received but networkd doesn't create the routes
  in the kernel. The machine keeps its IPv6 addresses, but without a
  default route it can't make any IPv6 connections or answer incoming
  IPv6 connections.

  Sorry, reproduction method is unclear. Here's a best guess:

  1. Configure networkd using netplan:

  ```yaml
  ---
  network:
    bonds:
      bond0:
        addresses:
        - xxx.xxx.202.112/24
        dhcp4: false
        interfaces:
        - eth0
        - eth1
        macaddress: 2C:EA:7F:56:9A:66
        parameters:
          mii-monitor-interval: 1
          mode: active-backup
    ethernets:
      eth0:
        dhcp4: false
        match:
          macaddress: 2C:EA:7F:56:9A:66
      eth1:
        dhcp4: false
        match:
          macaddress: 2C:EA:7F:56:9A:67
    renderer: networkd
    version: 2
  ```

  2. Load the machine, or just wait. Possibly this is related to packets being 
dropped, but I would expect the system to recover once the load is removed.
  3. Note the lack of IPv6 connectivity, inability to log in with ssh, etc.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2053288/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

[Touch-packages] [Bug 2053288] [NEW] systemd-networkd IPv6 default routes dropped under load, don't recover

Reply via email to