Public bug reported: We are losing port channel aggregation on reboot.
After the reboot, /var/log/syslog contains the entries: [ 250.790758] bond2: An illegal loopback occurred on adapter (enp24s0f1np1) Check the configuration to verify that all adapters are connected to 802.3ad compliant switch ports [ 282.029426] bond2: An illegal loopback occurred on adapter (enp24s0f1np1) Check the configuration to verify that all adapters are connected to 802.3ad compliant switch ports Aggregator IDs of the slave interfaces are different: ubuntu@node-6:~$ cat /proc/net/bonding/bond2 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bonding Mode: IEEE 802.3ad Dynamic link aggregation Transmit Hash Policy: layer3+4 (1) MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 802.3ad info LACP rate: fast Min links: 0 Aggregator selection policy (ad_select): stable Slave Interface: enp24s0f1np1 MII Status: up Speed: 10000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: b0:26:28:48:9f:51 Slave queue ID: 0 Aggregator ID: 1 Actor Churn State: none Partner Churn State: none Actor Churned Count: 0 Partner Churned Count: 0 Slave Interface: enp24s0f0np0 MII Status: up Speed: 10000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: b0:26:28:48:9f:50 Slave queue ID: 0 Aggregator ID: 2 Actor Churn State: churned Partner Churn State: churned Actor Churned Count: 1 Partner Churned Count: 1 The mismatch in "Aggregator ID" on the port is a symptom of the issue. If we do 'ip link set dev bond2 down' and 'ip link set dev bond2 up', the port with the mismatched ID appears to renegotiate with the port- channel and becomes aggregated. The other way to workaround this issue is to put bond ports down and bring up port enp24s0f0np0 first and port enp24s0f1np1 second. When I change the order of bringing the ports up (first enp24s0f1np1, and second enp24s0f0np0), the issue is still there. When the issue occurs, a port on the switch, corresponding to interface enp24s0f0np0 is in Suspended state. After applying the workaround the port is no longer in Suspended state and Aggregator IDs in /proc/net/bonding/bond2 are equal. I installed 5.0.0 kernel, the issue is still there. Operating System: Ubuntu 18.04.2 LTS (GNU/Linux 4.15.0-52-generic x86_64) ubuntu@node-6:~$ uname -a Linux node-6 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux ubuntu@node-6:~$ sudo lspci -vnvn https://pastebin.ubuntu.com/p/Dy2CKDbySC/ Hardware: Dell PowerEdge R740xd BIOS version: 2.1.7 sosreport: https://drive.google.com/open?id=1-eN7cZJIeu- AQBEU7Gw8a_AJTuq0AOZO ubuntu@node-6:~$ lspci | grep Ethernet | grep 10G https://pastebin.ubuntu.com/p/sqCx79vZWM/ ubuntu@node-6:~$ lspci -n | grep 18:00 18:00.0 0200: 14e4:16d8 (rev 01) 18:00.1 0200: 14e4:16d8 (rev 01) ubuntu@node-6:~$ modinfo bnx2x https://pastebin.ubuntu.com/p/pkmzsFjK8M/ ubuntu@node-6:~$ ip -o l https://pastebin.ubuntu.com/p/QpW7TjnT2v/ ubuntu@node-6:~$ ip -o a https://pastebin.ubuntu.com/p/MczKtrnmDR/ ubuntu@node-6:~$ cat /etc/netplan/98-juju.yaml https://pastebin.ubuntu.com/p/9cZpPc7C6P/ ubuntu@node-6:~$ sudo lshw -c network https://pastebin.ubuntu.com/p/gmfgZptzDT/ ** Affects: linux (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1834322 Title: Losing port aggregate with 802.3ad port-channel/bonding aggregation on reboot To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1834322/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs