On Tue, Sep 13, 2016 at 06:20:49PM -0000, bugproxy wrote: > During investigation, and problem/mistake was found with systemd but is > almost-certainly not the cause of the hang.
Agreed. > This fixed systemd was supposedly being made available in xenial-proposed > repositories, but so far does not seem to have appeared there. The systemd package is present in the xenial-proposed repository, but no updated installer image has yet been produced that includes it. We have had sufficient verification of the systemd change that it will be released to xenial users for the general problem; we will also update the debian-installer images as a matter of course. Based on the feedback from gpicc...@br.ibm.com, it does not appear that the buggy udev rule is blocking progress on this bug. > This bug was placed in "verify" state and it started causing email to be > sent several times a day reminding me to verify the fix. I don't know why this would be. Our process generates a single message to the bug when a package is accepted into the -proposed repository, it does not send daily reminder messages. > ------- Comment From dougm...@us.ibm.com 2016-09-13 14:18 EDT------- > I should also ammend my previous comment by saying, if Canonical has some > suggestions of how to gather more information in order to help debug this, > they should let us know and we can make test runs for them. My previous suggestion to gpiccoli on IRC was to modify the initrd to dump the state of the udev database at a point after the hang. I haven't seen such output attached here; does that mean it's not possible to produce such results because the kernel hard locks? Currently the only debugging information I've seen is that the /lib/debian-installer/start-udev script never returns, but that does not mean the kernel has locked up - it only shows that udev believes it has not finished processing. I would still like to see a dump of the udev database at the point of the hang, not just a udev debug log showing processing up to that point. Is this problem only reproducible with the X710 ethernet adapter? Is this a removable ethernet adapter, and have you tested what happens if it's removed? If it's not removable, have you tested what happens if you blacklist the i40e driver? The ethernet driver may be a complete red herring, and the problem may be with something that normally happens after ethernet driver initialization rather than with the ethernet driver itself. I would also have asked whether this could be an issue with the console output being redirected to some different device, but since Guilherme indicated that the problem appeared to be racy, with boot to the installer sometimes succeeding, that seems unlikely to be the problem. If you can reproduce this problem with the cloud image from <http://cloud-images.ubuntu.com/xenial/current/xenial-server-cloudimg-ppc64el-disk1.img>, that would present additional debugging opportunities since that uses a standard Ubuntu initramfs instead of the installer initramfs and will support various 'break=' options to interrupt the boot and introspect the system state. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1615021 Title: Unable to network boot Ubuntu 16.04 installer normally on Briggs Status in busybox package in Ubuntu: Fix Released Status in debian-installer package in Ubuntu: Triaged Status in systemd package in Ubuntu: Fix Released Status in busybox source package in Xenial: Won't Fix Status in debian-installer source package in Xenial: Triaged Status in systemd source package in Xenial: Fix Committed Status in busybox source package in Yakkety: Fix Released Status in debian-installer source package in Yakkety: Triaged Status in systemd source package in Yakkety: Fix Released Bug description: == Comment: #7 - Guilherme Guaglianoni Piccoli <gpicc...@br.ibm.com> - 2016-08-19 10:08:07 == The normal procedure to perform a Netboot installation of Ubuntu 16.04 is to download the latest vmlinux and initrd.gz files available, and kexec them with no parameters (at least in ppc64el). We're experiencing a strange issue in which the installer freezes before menus are showed. The system hangs in the point specified below, right after the i40e driver initialization: [ 11.052832] i40e 0002:01:00.0 enP2p1s0f0: renamed from eth0 [ 11.073976] i40e 0002:01:00.1 enP2p1s0f1: renamed from eth1 [ 11.117799] i40e 0002:01:00.2 enP2p1s0f2: renamed from eth2 [ 11.225745] i40e 0002:01:00.3 enP2p1s0f3: renamed from eth3 ***HANG*** The most difficult part in this issue is that it seems to be a timing issue/race condition, and many debug trials end up by avoiding the issue reproduction (heisenbug). We were successful though in getting logs by booting the kernel with the command-line "BOOT_DEBUG=2" and by changing the initrd in order to enable systemd debug; only the files "init" and "start-udev" were changed in initrd, both attached here. We've attached here a saved screen session that shows the entire boot process until it gets flooded with lots of messages like: "starting '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules' '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules'(err) 'failed to execute '/bin/readlink' '/bin/readlink /etc/ udev/rules.d/80-net-setup-link.rules': No such file or directory' seq 3244 queued, 'add' 'pci_bus' starting '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules' passed 408 byte device to netlink monitor 0x1003cfe8020seq 3236 running'/bin/readlink /etc/udev/rules.d/80-net-setup-l ink.rules'(err) 'failed to execute '/bin/readlink' '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules': No such file or directory' '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules'(err) 'failed to execute '/bin/readlink' '/bin/readlink /etc/ udev/rules.d/80-net-setup-link.rules': No such file or directory' Process '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules' failed with exit code 2. PROGRAM '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules' /lib/udev/rules.d/73-usb-net-by-mac.rules:6 passed device to netlink monitor 0x1003d01f730 " Then it keeps hanged in this stage. We re-tested it by changing the file 73-usb-net-by-mac.rules in initrd, replacing " /etc/udev/rules.d/80-net-setup-link.rules" to "/lib/udev/rules.d/80 -net-setup-link.rules", since the former does not exist whereas the latter does. Same issue were observed! Notice that if we boot the installer with command-line "net.ifnames=0" or "net.ifnames=1", the problem does not reproduces anymore. We want to ask Canonical's help in investigating this issue. Thanks, Guilherme SRU INFORMATION for systemd =========================== Test case: * Check what happens for uevents on devices which are not USB network interfaces: udevadm test /sys/devices/virtual/mem/null udevadm test /sys/class/net/lo With the current version these will run PROGRAM '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules' /lib/udev/rules.d/73-usb-net-by-mac.rules:6 which is pointless. With the proposed version these should be gone. * Ensure that the rule still works as intended by connecting an USB network device that has a permanent MAC address (e. g. Android tethering uses a temporary MAC): You should get a MAC-based name like "enx12345678" for it. Now disconnect it again, disable ifnames with sudo ln -s /dev/null /etc/udev/rules.d/80-net-setup-link.rules and reconnect the device. You should now get a kernel name like "usb0" for it. * Regression potential: Errors in the rule could break persistent naming - or its disabling - of USB network interfaces. Running the above test carefully is important to ensure this keeps working. This has little to no actual effect on anything else on the system (aside from a performance impact and spamming logs), so overall the regression potential is low. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/busybox/+bug/1615021/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp