** Description changed:

- It was brought to my attention (by others) that ifupdown runs into race
- conditions on some specific cases.
- 
- [Impact]
- 
- When trying to deploy many servers at once (higher chances of happening)
- or from time-to-time, like any other intermittent race-condition.
- Interfaces are not brought up like they should and this has a big impact
- for servers that cannot rely on network start scripts.
- 
- The problem is caused by a race condition when init(upstart) starts up
- network interfaces in parallel.
- 
- [Test Case]
- 
- Use attached script to reproduce the error (it might take some hours, in
- a single virtual machine, for the error to occur).
- 
  * please consider my bonding examples are using eth1 and eth2 as slave
-  interfaces.
- 
- ifupdown some race conditions explained bellow:
+  interfaces.
+ 
+ ifupdown some race conditions explained bellow. ifenslave does not
+ behave well with sysv networking and upstart network-interface scripts
+ running together.
  
  !!!!
  case 1)
  (a) ifup eth0 (b) ifup -a for eth0
  -----------------------------------------------------------------
  1-1. Lock ifstate.lock file.
-                                   1-1. Wait for locking ifstate.lock
-                                       file.
+                                   1-1. Wait for locking ifstate.lock
+                                       file.
  1-2. Read ifstate file to check
-      the target NIC.
+      the target NIC.
  1-3. close(=release) ifstate.lock
-      file.
+      file.
  1-4. Judge that the target NIC
-      isn't processed.
-                                   1-2. Read ifstate file to check
-                                        the target NIC.
-                                   1-3. close(=release) ifstate.lock
-                                        file.
-                                   1-4. Judge that the target NIC
-                                        isn't processed.
+      isn't processed.
+                                   1-2. Read ifstate file to check
+                                        the target NIC.
+                                   1-3. close(=release) ifstate.lock
+                                        file.
+                                   1-4. Judge that the target NIC
+                                        isn't processed.
  2. Lock and update ifstate file.
-    Release the lock.
-                                   2. Lock and update ifstate file.
-                                      Release the lock.
+    Release the lock.
+                                   2. Lock and update ifstate file.
+                                      Release the lock.
  !!!
+ 
+ to be explained
  
  !!!
  case 2)
- (a) ifenslave of eth0 (b) ifenslave of eth0
+ (a) ifenslave of eth0                  (b) ifenslave of eth0
  ------------------------------------------------------------------
- 3. Execute ifenslave of eth0.     3. Execute ifenslave of eth0.
+ 3. Execute ifenslave of eth0.  3. Execute ifenslave of eth0.
  4. Link down the target NIC.
  5. Write NIC id to
-    /sys/class/net/bond0/bonding
-    /slaves then NIC gets up
-                                   4. Link down the target NIC.
-                                   5. Fails to write NIC id to
-                                      /sys/class/net/bond0/bonding/
-                                      slaves it is already written.
+    /sys/class/net/bond0/bonding
+    /slaves then NIC gets up
+                                   4. Link down the target NIC.
+                                   5. Fails to write NIC id to
+                                      /sys/class/net/bond0/bonding/
+                                      slaves it is already written.
  !!!
+ 
+ #####################################################################
+ 
+ #### My setup:
+ 
+ root@provisioned:~# cat /etc/modprobe.d/bonding.conf
+ alias bond0 bonding options bonding mode=1 arp_interval=2000
+ 
+ Both, /etc/init.d/networking and upstart network-interface begin
+ enabled.
+ 
+ #### Beginning:
+ 
+ root@provisioned:~# cat /etc/network/interfaces
+ # /etc/network/interfaces
+ 
+ auto lo
+ iface lo inet loopback
+ 
+ auto eth0
+ iface eth0 inet dhcp
+ 
+ I'm able to boot with both scripts (networking and network-interface
+ enabled) with no problem. I can also boot with only "networking" 
+ script enabled:
+ 
+ ---
+ root@provisioned:~# initctl list | grep network
+ network-interface stop/waiting
+ network-interface-security (networking) start/running
+ networking start/running
+ network-interface-container stop/waiting
+ ---
+ 
+ OR only the script "network-interface" enabled:
+ 
+ ---
+ root@provisioned:~# initctl list | grep network
+ network-interface (eth2) start/running
+ network-interface (lo) start/running
+ network-interface (eth0) start/running
+ network-interface (eth1) start/running
+ networking start/running
+ network-interface-container stop/waiting
+ ---
+ 
+ #### Enabling bonding:
+ 
+ Following ifenslave configuration example (/usr/share/doc/ifenslave/
+ examples/two_hotplug_ethernet), my /etc/network/interfaces has to 
+ look like this:
+ 
+ ---
+ auto eth1
+ iface eth1 inet manual
+     bond-master bond0
+ 
+ auto eth2
+ iface eth2 inet manual
+     bond-master bond0
+ 
+ auto bond0
+ iface bond0 inet static
+     bond-mode 1
+     bond-miimon 100
+     bond-primary eth1 eth2
+       address 192.168.169.1
+       netmask 255.255.255.0
+       broadcast 192.168.169.255
+ ---
+ 
+ Having both scripts running does not make any difference since we
+ are missing "bond-slaves" keyword on slave interfaces, for ifenslave
+ to work, and they are set to "manual".
+ 
+ Ifenslave code:
+ 
+ """
+ for slave in $BOND_SLAVES ; do
+ ...
+ # Ensure $slave is down.
+ ip link set "$slave" down 2>/dev/null
+ if ! sysfs_add slaves "$slave" 2>/dev/null ; then
+       echo "Failed to enslave $slave to $BOND_MASTER. Is $BOND_MASTER 
+                       ready and a bonding interface ?" >&2
+ else
+       # Bring up slave if it is the target of an allow-bondX stanza.
+       # This is usefull to bring up slaves that need extra setup.
+       if [ -z "$(which ifquery)" ] || ifquery --allow \"$BOND_MASTER\" 
+               --list | grep -q $slave; then
+               ifup $v --allow "$BOND_MASTER" "$slave"
+       fi
+ """
+ 
+ Without the keyword "bond-slaves" on the master interface declaration,
+ ifenslave will NOT bring any slave interface up on the "master" 
+ interface ifup invocation. 
+ 
+ *********** Part 1
+ 
+ So, having networking sysv init script AND upstart network-interface
+ script running together... the following example works:
+ 
+ ---
+ root@provisioned:~# cat /etc/network/interfaces
+ # /etc/network/interfaces
+ 
+ auto lo
+ iface lo inet loopback
+ 
+ auto eth0
+ iface eth0 inet dhcp
+ 
+ auto eth1
+ iface eth1 inet manual
+     bond-master bond0
+ 
+ auto eth2
+ iface eth2 inet manual
+     bond-master bond0
+ 
+ auto bond0
+ iface bond0 inet static
+     bond-mode 1
+     bond-miimon 100
+     bond-primary eth1
+     bond-slaves eth1 eth2
+     address 192.168.169.1
+     netmask 255.255.255.0
+     broadcast 192.168.169.255
+ ---
+ 
+ Ifenslave script sets link down to all slave interfaces, declared by 
+ "bond-slaves" keyword, and assigns them to correct bonding. Ifenslave 
+ script ONLY tries to make a reentrant call to ifupdown if the slave 
+ interfaces have "allow-bondX" stanza (not our case). 
+ 
+ So this should not work, since when the master bonding interface
+ (bond0) is called, ifenslave does not configure slaves without
+ "allow-bondX" stanza. What is happening, why is it working ?
+ 
+ If we disable upstart "network-interface" script.. our bonding stops
+ to work on the boot. This is because upstart was the one setting 
+ the slave interfaces up (with the configuration above) and not
+ sysv networking scripts. 
+ 
+ It is clear that ifenslave from sysv script invocation can set the 
+ slave interface down anytime (even during upstart script execution) 
+ so it might work and might not:
+ 
+ """
+ ip link set "$slave" down 2>/dev/null
+ """
+ 
+ root@provisioned:~# initctl list | grep network-interface
+ network-interface (eth2) start/running
+ network-interface (lo) start/running
+ network-interface (bond0) start/running
+ network-interface (eth0) start/running
+ network-interface (eth1) start/running
+ 
+ Since having the interface down is a requirement to slave it, 
+ running both scripts together (upstart and sysv) could create a 
+ situation where upstart puts slave interface online but ifenslave
+ from sysv script puts it down and never bring it up again (because
+ it does not have "allow-bondX" stanza). 
+ 
+ *********** Part 2
+ 
+ What if I disable upstart "network-interface", stay only with the sysv
+ script but introduce the "allow-bondX" stanza to slave interfaces ? 
+ 
+ The funny part begins... without upstart, the ifupdown tool calls
+ ifenslave, for bond0 interface, and ifenslave calls this line:
+ 
+ """
+ for slave in $BOND_SLAVES ; do
+ ...
+       if [ -z "$(which ifquery)" ] || ifquery --allow \"$BOND_MASTER\" 
+               --list | grep -q $slave; then
+               ifup $v --allow "$BOND_MASTER" "$slave"
+       fi
+ """
+ 
+ But ifenslave stays waiting for the bond0 interface to be online
+ forever. We do have a chicken egg situation now:
+ 
+ * ifupdown trys to put bond0 interface online. 
+ * we are not running upstart network-interface script.
+ * ifupdown for bond0 calls ifenslave.
+ * ifenslave tries to find interfaces with "allow-bondX" stanza
+ * ifenslave tries to ifup slave interfaces with that stanza
+ * slave interfaces keep forever waiting for the master
+ * master is waiting for the slave interface
+ * slave interface is waiting for the master interface
+ ... :D
+ 
+ And we have an infinite loop for ifenslave:
+ 
+ """ 
+ # Wait for the master to be ready
+ [ ! -f /run/network/ifenslave.$BOND_MASTER ] && 
+       echo "Waiting for bond master $BOND_MASTER to be ready"
+ while :; do
+     if [ -f /run/network/ifenslave.$BOND_MASTER ]; then
+         break
+     fi
+     sleep 0.1
+ done
+ """
+ 
+ *********** Conclusion
+ 
+ That can be achieved if correct triggers are set (like the ones I just
+ showed). Not having ifupdown parallel executions (sysv and upstart, 
+ for example) can make an infinite loop to happen during the boot.
+ 
+ Having parallel ifupdown executions can trigger race conditions
+ between:
+ 
+ 1) ifupdown itself (case a on the bug description).
+ 2) ifupdown and ifenslave script (case b on the bug description).

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1337873

Title:
  Precise, Trusty, Utopic - ifupdown initialization problems caused by
  race condition

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ifupdown/+bug/1337873/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to