** Description changed: It was brought to my attention (by others) that ifupdown runs into race conditions on some specific cases. [Impact] When trying to deploy many servers at once (higher chances of happening) or from time-to-time, like any other intermittent race-condition. Interfaces are not brought up like they should and this has a big impact for servers that cannot rely on network start scripts. The problem is caused by a race condition when init(upstart) starts up network interfaces in parallel. [Test Case] Use attached script to reproduce the error (it might take some hours, in a single virtual machine, for the error to occur). - (example 1) + * please consider my bonding examples are using eth1 and eth2 as slave + interfaces. - *** sequence to trigger race-condition *** + ifupdown some race conditions explained bellow: - (a) ifup eth0 (b) ifup -a for eth0 + !!!! + case 1) + (a) ifup eth0 (b) ifup -a for eth0 ----------------------------------------------------------------- 1-1. Lock ifstate.lock file. - 1-1. Wait for locking ifstate.lock - file. + 1-1. Wait for locking ifstate.lock + file. 1-2. Read ifstate file to check - the target NIC. + the target NIC. 1-3. close(=release) ifstate.lock - file. + file. 1-4. Judge that the target NIC - isn't processed. - 1-2. Read ifstate file to check - the target NIC. - 1-3. close(=release) ifstate.lock - file. - 1-4. Judge that the target NIC - isn't processed. + isn't processed. + 1-2. Read ifstate file to check + the target NIC. + 1-3. close(=release) ifstate.lock + file. + 1-4. Judge that the target NIC + isn't processed. 2. Lock and update ifstate file. - Release the lock. - 2. Lock and update ifstate file. - Release the lock. + Release the lock. + 2. Lock and update ifstate file. + Release the lock. + !!! - (example 2) - Bonding device using eth0. - ifenslave for eth0 is also executed in parallel, eth0 remains down. - - *** sequence to trigger race-condition *** - - (a) ifenslave of eth0 (b) ifenslave of eth0 + !!! + case 2) + (a) ifenslave of eth0 (b) ifenslave of eth0 ------------------------------------------------------------------ - 3. Execute ifenslave of eth0. 3. Execute ifenslave of eth0. + 3. Execute ifenslave of eth0. 3. Execute ifenslave of eth0. 4. Link down the target NIC. 5. Write NIC id to - /sys/class/net/bond0/bonding + /sys/class/net/bond0/bonding /slaves then NIC gets up - 4. Link down the target NIC. - 5. Fails to write NIC id to - /sys/class/net/bond0/bonding/ + 4. Link down the target NIC. + 5. Fails to write NIC id to + /sys/class/net/bond0/bonding/ slaves it is already written. - - (example 3) - - bonding is not set to active-backup as defined in config file: When the - init(upstart) executes "if-pre-up.d/ifenslave" script and "if-pre- - up.d/vlan" script for bond0 device in parallel, the "if-pre- - up.d/ifenslave" script fails to change the bonding mode with a error - message, "bonding: unable to update mode of bond0 because interface is - up.". - - *** sequence to trigger race-condition *** - - (a)ifup bond0 (b)ifup -a - ----------------------------------------------------------------------- - 1. Update statefile about bond0. - 1. Does nothing about bond0 - because statefile is already - updated about it. - 2. ifenslave::setup_master() - sysfs_change_down mode 1 - and link down bond0. - 2. Link up bond0 by the vlan - script on the processing - for linking up bond0.201(*1). - 3. "echo 1 > .../mode" fails. - - [ /etc/network/if-pre-up.d/vlan ] - - 46 if [ -n "$IF_VLAN_RAW_DEVICE" ] && [ ! -d /sys/class/net/$IFACE ]; then - 47 if [ ! -x /sbin/vconfig ]; then - 48 exit 0 - 49 fi - 50 if ! ip link show dev "$IF_VLAN_RAW_DEVICE" > /dev/null; then - 51 echo "$IF_VLAN_RAW_DEVICE does not exist, unable to create $IFACE" - 52 exit 1 - 53 fi - 54 ip link set up dev $IF_VLAN_RAW_DEVICE <-- (*1). - 55 vconfig add $IF_VLAN_RAW_DEVICE $VLANID - 56 fi - - - [Regression Potential] - - * Attaching proposed patch (for upstream as well) and describing - potential later on today. - - [Other Info] - - Example: [ /etc/network/interfaces ] - - auto lo - iface lo inet loopback - - auto eth0 - iface eth0 inet manual - bond-master bond0 - - auto eth1 - iface eth1 inet manual - bond-master bond0 - - auto bond0 - iface bond0 inet dhcp - bond-slaves eth0 eth1 - hwaddress 11:22:33:44:55:66 - bond-primary eth0 - bond-mode 1 - bond-miimon 100 - bond-updelay 200 - bond-downdelay 200 - - auto bond0.201 - iface bond0.201 inet dhcp - hwaddress 11:22:33:44:55:66 - vlan-raw-device bond0 - ... - - auto bond0.205 - iface bond0.205 inet dhcp - hwaddress 11:22:33:44:55:66 - vlan-raw-device bond0 + !!!
** Description changed: It was brought to my attention (by others) that ifupdown runs into race conditions on some specific cases. [Impact] When trying to deploy many servers at once (higher chances of happening) or from time-to-time, like any other intermittent race-condition. Interfaces are not brought up like they should and this has a big impact for servers that cannot rely on network start scripts. The problem is caused by a race condition when init(upstart) starts up network interfaces in parallel. [Test Case] Use attached script to reproduce the error (it might take some hours, in a single virtual machine, for the error to occur). * please consider my bonding examples are using eth1 and eth2 as slave - interfaces. + interfaces. ifupdown some race conditions explained bellow: !!!! case 1) (a) ifup eth0 (b) ifup -a for eth0 ----------------------------------------------------------------- 1-1. Lock ifstate.lock file. - 1-1. Wait for locking ifstate.lock - file. + 1-1. Wait for locking ifstate.lock + file. 1-2. Read ifstate file to check - the target NIC. + the target NIC. 1-3. close(=release) ifstate.lock - file. + file. 1-4. Judge that the target NIC - isn't processed. - 1-2. Read ifstate file to check - the target NIC. - 1-3. close(=release) ifstate.lock - file. - 1-4. Judge that the target NIC - isn't processed. + isn't processed. + 1-2. Read ifstate file to check + the target NIC. + 1-3. close(=release) ifstate.lock + file. + 1-4. Judge that the target NIC + isn't processed. 2. Lock and update ifstate file. - Release the lock. - 2. Lock and update ifstate file. - Release the lock. + Release the lock. + 2. Lock and update ifstate file. + Release the lock. !!! - !!! case 2) (a) ifenslave of eth0 (b) ifenslave of eth0 ------------------------------------------------------------------ 3. Execute ifenslave of eth0. 3. Execute ifenslave of eth0. 4. Link down the target NIC. 5. Write NIC id to - /sys/class/net/bond0/bonding - /slaves then NIC gets up - 4. Link down the target NIC. - 5. Fails to write NIC id to - /sys/class/net/bond0/bonding/ - slaves it is already written. + /sys/class/net/bond0/bonding + /slaves then NIC gets up + 4. Link down the target NIC. + 5. Fails to write NIC id to + /sys/class/net/bond0/bonding/ + slaves it is already written. !!! -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1337873 Title: Precise, Trusty, Utopic - ifupdown initialization problems caused by race condition To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ifupdown/+bug/1337873/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs