** Description changed: - It was brought to my attention (by others) that ifupdown runs into race - conditions on some specific cases. - - [Impact] - - When trying to deploy many servers at once (higher chances of happening) - or from time-to-time, like any other intermittent race-condition. - Interfaces are not brought up like they should and this has a big impact - for servers that cannot rely on network start scripts. - - The problem is caused by a race condition when init(upstart) starts up - network interfaces in parallel. - - [Test Case] - - Use attached script to reproduce the error (it might take some hours, in - a single virtual machine, for the error to occur). - * please consider my bonding examples are using eth1 and eth2 as slave - interfaces. - - ifupdown some race conditions explained bellow: + interfaces. + + ifupdown some race conditions explained bellow. ifenslave does not + behave well with sysv networking and upstart network-interface scripts + running together. !!!! case 1) (a) ifup eth0 (b) ifup -a for eth0 ----------------------------------------------------------------- 1-1. Lock ifstate.lock file. - 1-1. Wait for locking ifstate.lock - file. + 1-1. Wait for locking ifstate.lock + file. 1-2. Read ifstate file to check - the target NIC. + the target NIC. 1-3. close(=release) ifstate.lock - file. + file. 1-4. Judge that the target NIC - isn't processed. - 1-2. Read ifstate file to check - the target NIC. - 1-3. close(=release) ifstate.lock - file. - 1-4. Judge that the target NIC - isn't processed. + isn't processed. + 1-2. Read ifstate file to check + the target NIC. + 1-3. close(=release) ifstate.lock + file. + 1-4. Judge that the target NIC + isn't processed. 2. Lock and update ifstate file. - Release the lock. - 2. Lock and update ifstate file. - Release the lock. + Release the lock. + 2. Lock and update ifstate file. + Release the lock. !!! + + to be explained !!! case 2) - (a) ifenslave of eth0 (b) ifenslave of eth0 + (a) ifenslave of eth0 (b) ifenslave of eth0 ------------------------------------------------------------------ - 3. Execute ifenslave of eth0. 3. Execute ifenslave of eth0. + 3. Execute ifenslave of eth0. 3. Execute ifenslave of eth0. 4. Link down the target NIC. 5. Write NIC id to - /sys/class/net/bond0/bonding - /slaves then NIC gets up - 4. Link down the target NIC. - 5. Fails to write NIC id to - /sys/class/net/bond0/bonding/ - slaves it is already written. + /sys/class/net/bond0/bonding + /slaves then NIC gets up + 4. Link down the target NIC. + 5. Fails to write NIC id to + /sys/class/net/bond0/bonding/ + slaves it is already written. !!! + + ##################################################################### + + #### My setup: + + root@provisioned:~# cat /etc/modprobe.d/bonding.conf + alias bond0 bonding options bonding mode=1 arp_interval=2000 + + Both, /etc/init.d/networking and upstart network-interface begin + enabled. + + #### Beginning: + + root@provisioned:~# cat /etc/network/interfaces + # /etc/network/interfaces + + auto lo + iface lo inet loopback + + auto eth0 + iface eth0 inet dhcp + + I'm able to boot with both scripts (networking and network-interface + enabled) with no problem. I can also boot with only "networking" + script enabled: + + --- + root@provisioned:~# initctl list | grep network + network-interface stop/waiting + network-interface-security (networking) start/running + networking start/running + network-interface-container stop/waiting + --- + + OR only the script "network-interface" enabled: + + --- + root@provisioned:~# initctl list | grep network + network-interface (eth2) start/running + network-interface (lo) start/running + network-interface (eth0) start/running + network-interface (eth1) start/running + networking start/running + network-interface-container stop/waiting + --- + + #### Enabling bonding: + + Following ifenslave configuration example (/usr/share/doc/ifenslave/ + examples/two_hotplug_ethernet), my /etc/network/interfaces has to + look like this: + + --- + auto eth1 + iface eth1 inet manual + bond-master bond0 + + auto eth2 + iface eth2 inet manual + bond-master bond0 + + auto bond0 + iface bond0 inet static + bond-mode 1 + bond-miimon 100 + bond-primary eth1 eth2 + address 192.168.169.1 + netmask 255.255.255.0 + broadcast 192.168.169.255 + --- + + Having both scripts running does not make any difference since we + are missing "bond-slaves" keyword on slave interfaces, for ifenslave + to work, and they are set to "manual". + + Ifenslave code: + + """ + for slave in $BOND_SLAVES ; do + ... + # Ensure $slave is down. + ip link set "$slave" down 2>/dev/null + if ! sysfs_add slaves "$slave" 2>/dev/null ; then + echo "Failed to enslave $slave to $BOND_MASTER. Is $BOND_MASTER + ready and a bonding interface ?" >&2 + else + # Bring up slave if it is the target of an allow-bondX stanza. + # This is usefull to bring up slaves that need extra setup. + if [ -z "$(which ifquery)" ] || ifquery --allow \"$BOND_MASTER\" + --list | grep -q $slave; then + ifup $v --allow "$BOND_MASTER" "$slave" + fi + """ + + Without the keyword "bond-slaves" on the master interface declaration, + ifenslave will NOT bring any slave interface up on the "master" + interface ifup invocation. + + *********** Part 1 + + So, having networking sysv init script AND upstart network-interface + script running together... the following example works: + + --- + root@provisioned:~# cat /etc/network/interfaces + # /etc/network/interfaces + + auto lo + iface lo inet loopback + + auto eth0 + iface eth0 inet dhcp + + auto eth1 + iface eth1 inet manual + bond-master bond0 + + auto eth2 + iface eth2 inet manual + bond-master bond0 + + auto bond0 + iface bond0 inet static + bond-mode 1 + bond-miimon 100 + bond-primary eth1 + bond-slaves eth1 eth2 + address 192.168.169.1 + netmask 255.255.255.0 + broadcast 192.168.169.255 + --- + + Ifenslave script sets link down to all slave interfaces, declared by + "bond-slaves" keyword, and assigns them to correct bonding. Ifenslave + script ONLY tries to make a reentrant call to ifupdown if the slave + interfaces have "allow-bondX" stanza (not our case). + + So this should not work, since when the master bonding interface + (bond0) is called, ifenslave does not configure slaves without + "allow-bondX" stanza. What is happening, why is it working ? + + If we disable upstart "network-interface" script.. our bonding stops + to work on the boot. This is because upstart was the one setting + the slave interfaces up (with the configuration above) and not + sysv networking scripts. + + It is clear that ifenslave from sysv script invocation can set the + slave interface down anytime (even during upstart script execution) + so it might work and might not: + + """ + ip link set "$slave" down 2>/dev/null + """ + + root@provisioned:~# initctl list | grep network-interface + network-interface (eth2) start/running + network-interface (lo) start/running + network-interface (bond0) start/running + network-interface (eth0) start/running + network-interface (eth1) start/running + + Since having the interface down is a requirement to slave it, + running both scripts together (upstart and sysv) could create a + situation where upstart puts slave interface online but ifenslave + from sysv script puts it down and never bring it up again (because + it does not have "allow-bondX" stanza). + + *********** Part 2 + + What if I disable upstart "network-interface", stay only with the sysv + script but introduce the "allow-bondX" stanza to slave interfaces ? + + The funny part begins... without upstart, the ifupdown tool calls + ifenslave, for bond0 interface, and ifenslave calls this line: + + """ + for slave in $BOND_SLAVES ; do + ... + if [ -z "$(which ifquery)" ] || ifquery --allow \"$BOND_MASTER\" + --list | grep -q $slave; then + ifup $v --allow "$BOND_MASTER" "$slave" + fi + """ + + But ifenslave stays waiting for the bond0 interface to be online + forever. We do have a chicken egg situation now: + + * ifupdown trys to put bond0 interface online. + * we are not running upstart network-interface script. + * ifupdown for bond0 calls ifenslave. + * ifenslave tries to find interfaces with "allow-bondX" stanza + * ifenslave tries to ifup slave interfaces with that stanza + * slave interfaces keep forever waiting for the master + * master is waiting for the slave interface + * slave interface is waiting for the master interface + ... :D + + And we have an infinite loop for ifenslave: + + """ + # Wait for the master to be ready + [ ! -f /run/network/ifenslave.$BOND_MASTER ] && + echo "Waiting for bond master $BOND_MASTER to be ready" + while :; do + if [ -f /run/network/ifenslave.$BOND_MASTER ]; then + break + fi + sleep 0.1 + done + """ + + *********** Conclusion + + That can be achieved if correct triggers are set (like the ones I just + showed). Not having ifupdown parallel executions (sysv and upstart, + for example) can make an infinite loop to happen during the boot. + + Having parallel ifupdown executions can trigger race conditions + between: + + 1) ifupdown itself (case a on the bug description). + 2) ifupdown and ifenslave script (case b on the bug description).
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1337873 Title: Precise, Trusty, Utopic - ifupdown initialization problems caused by race condition To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ifupdown/+bug/1337873/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs