Hello,

I'd like to share also my experience with this bug, which also affects
us at work (GRNET). We have the following setup:

# cat /etc/debian_version
9.3
# dpkg -l | grep -e ifupdown -e vlan -e bridge-utils | awk '{print $2, $3}'
ii  bridge-utils                     1.5-13+deb9u1
ii  ifupdown                         0.8.19
ii  vlan                             1.9-3.2+b1
# uname -a
Linux foo 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 
GNU/Linux

I have reproduced it by disabling networking.service, not loading the
bonding module on boot, with the following configuration:

# cat /etc/network/interfaces
auto bond0
iface bond0 inet static
  mtu                   9000
  bond-mode             802.3ad
  bond_xmit_hash_policy layer3+4
  bond-miimon           100
  slaves                ens5f0 ens5f1

auto vlan109
iface vlan109 inet manual
  bridge_ports   bond0.109
  bridge_stp     off
  bridge_maxwait 0
  bridge_fd      0
  mtu            9000

auto vlan110
iface vlan110 inet manual
  bridge_ports   bond0.110
  bridge_stp     off
  bridge_maxwait 0
  bridge_fd      0
  mtu            9000

# cat /etc/modules
8021q
bonding

In our case, we noticed the following timeline which is quite similar
like Apollon's one:

* bonding module gets loaded into the kernel, way before
  networking.service gets started (defined in /etc/modules), should
  be unnecessary tbh)
* Interface bond0 gets created, which triggers a udev 'add' action
* The action calls bridge-network-interface with INTERFACE=bond0
* bridge-network-interface creates interface bond0.109. bond0.109 has
  MTU 1500 because ifup has not ran yet
* The creation of bond0.109 triggers another udev 'add' action (which, I
  think, should not happen)
* bridge-network-interface tries to run ifup --allow auto vlan109
* The above command fails because it cannot set the MTU of vlan109 to
  9000, because bond0.109's MTU is 1500. vlan109 interface is left in an
  unconfigured state.
* /lib/udev/bridge-network-interface fails because of set -e
* The second call of bridge-network-interface with INTERFACE=bond0.109
  fails in a similar way. All other interfaces are untouched.
* systemd starts up networking.service and runs ifup --allow=auto -a
* bond0 gets MTU 9000
* ifup tries to get vlan109 interface up
* This fails because bond0.109's MTU is 1500. It seems that ifupdown
  and/or bridge-utils do not touch it
* ifup for vlan110 runs successfully because it creates a new bond0.110
  interface, which inherits the MTU of bond0, which is now 9000 and gets
  up correctly

The above behavior does not always happen: If, for some reason,
networking.service gets started before bridge-network-interface runs its
stuff, all interfaces will get up correctly. Also, this affects only the
first interface in /e/n/i which has bridge_ports stanza defined, because
bridge-network-interface fails for the reasons I described above.

I agree with Apollon, I really do not understand what the code is trying
to do and why BRIDGE_HOTPLUG defaults to yes. We ran into serious
problems with silent packet loss in QEMU VMs, which had their tap
interfaces bridged to the above vlanXXX interfaces and MTU 9000 and the
only way to mitigate this problem for now is to set BRIDGE_HOTPLUG=no.

Unfortunately, it's not quite easy for us to suggest a solution but we
can provide more information if needed.

Regards,
Nikos

Reply via email to