Package: src:linux Version: 3.16.36-1+deb8u1 Severity: normal Hi,
When booting, on about 5% of boots, the system hangs for several minutes while waiting for systemd-udev-settle to complete. (systemd-udev-settle is triggered by lvm2) The log shows: Oct 07 11:40:59 grisou-6.nancy.grid5000.fr systemd-udevd[461]: worker [517] /devices/system/cpu/cpu13 timeout; kill it Oct 07 11:40:59 grisou-6.nancy.grid5000.fr systemd-udevd[461]: seq 3533 '/devices/system/cpu/cpu13' killed Oct 07 11:40:59 grisou-6.nancy.grid5000.fr systemd-udevd[461]: worker [517] terminated by signal 9 (Killed) And systemd-udev-settle is seen as Failed as it reached the timeout: # systemctl status systemd-udev-settle.service ● systemd-udev-settle.service - udev Wait for Complete Device Initialization Loaded: loaded (/lib/systemd/system/systemd-udev-settle.service; static) Active: failed (Result: timeout) since Thu 2016-10-06 12:46:39 CEST; 1min 57s ago Docs: man:udev(7) man:systemd-udevd.service(8) Process: 456 ExecStart=/bin/udevadm settle (code=killed, signal=TERM) Main PID: 456 (code=killed, signal=TERM) It happens on various machines, of various models (all Dell, but I'm not sure this is relevant as all our recent machines are Dell machines). A hardware issue is unlikely. It is fixed in stretch and unstable. I bisected it, and found that commit 6f942a1f264e875c5f3ad6f505d7b500a3e7fa82 fixed it. That commit is: commit 6f942a1f264e875c5f3ad6f505d7b500a3e7fa82 Author: Peter Zijlstra <pet...@infradead.org> Date: Wed Sep 24 10:18:46 2014 +0200 locking/mutex: Don't assume TASK_RUNNING We're going to make might_sleep() test for TASK_RUNNING, because blocking without TASK_RUNNING will destroy the task state by setting it to TASK_RUNNING. There are a few occasions where its 'valid' to call blocking primitives (and mutex_lock in particular) and not have TASK_RUNNING, typically such cases are right before we set TASK_RUNNING anyhow. Robustify the code by not assuming this; this has the beneficial side effect of allowing optional code emission for fixing the above might_sleep() false positives. Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org> Cc: t...@linutronix.de Cc: ilya.dryo...@inktank.com Cc: umgwanakikb...@gmail.com Cc: Oleg Nesterov <o...@redhat.com> Cc: Linus Torvalds <torva...@linux-foundation.org> Link: http://lkml.kernel.org/r/20140924082241.988560...@infradead.org Signed-off-by: Ingo Molnar <mi...@kernel.org> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index dadbf88..4541951 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -378,8 +378,14 @@ done: * reschedule now, before we try-lock the mutex. This avoids getting * scheduled out right after we obtained the mutex. */ - if (need_resched()) + if (need_resched()) { + /* + * We _should_ have TASK_RUNNING here, but just in case + * we do not, make it so, otherwise we might get stuck. + */ + __set_current_state(TASK_RUNNING); schedule_preempt_disabled(); + } return false; } Unfortunately, the code around this was changed after 3.16, making a backport non-trivial. A workaround (for jessie systems) is to not install lvm2 if that is an option. Lucas -- Package-specific info: ** Version: Linux version 3.16.0-4-amd64 (debian-ker...@lists.debian.org) (gcc version 4.8.4 (Debian 4.8.4-1) ) #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) ** Command line: root=/dev/sda3 console=tty0 console=ttyS0,115200 ** Not tainted ** Model information sys_vendor: Dell Inc. product_name: PowerEdge R630 product_version: chassis_vendor: Dell Inc. chassis_version: bios_vendor: Dell Inc. bios_version: 1.3.6 board_vendor: Dell Inc. board_name: 0CNCJW board_version: A08 ** Loaded modules: x86_pkg_temp_thermal intel_powerclamp ttm drm_kms_helper intel_rapl coretemp kvm_intel kvm crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd evdev pcspkr dcdbas iTCO_wdt ipmi_devintf iTCO_vendor_support drm ipmi_si ipmi_msghandler mei_me mei lpc_ich shpchp processor mfd_core thermal_sys wmi acpi_power_meter button autofs4 ext4 crc16 mbcache jbd2 sg sd_mod crc_t10dif crct10dif_generic ahci igb i2c_algo_bit ehci_pci libahci ixgbe i2c_core ehci_hcd libata megaraid_sas dca crct10dif_pclmul crct10dif_common ptp crc32c_intel usbcore pps_core usb_common mlx4_core mdio scsi_mod ** PCI devices: not available ** USB devices: not available -- System Information: Debian Release: 8.6 APT prefers stable APT policy: (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 3.16.0-4-amd64 (SMP w/32 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) (ignored: LC_ALL set to en_US.UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages linux-image-3.16.0-4-amd64 depends on: ii debconf [debconf-2.0] 1.5.56 ii initramfs-tools [linux-initramfs-tool] 0.120+deb8u2 ii kmod 18-3 ii linux-base 3.5 Versions of packages linux-image-3.16.0-4-amd64 recommends: pn firmware-linux-free <none> pn irqbalance <none> Versions of packages linux-image-3.16.0-4-amd64 suggests: pn debian-kernel-handbook <none> ii extlinux 3:6.03+dfsg-5+deb8u1 pn linux-doc-3.16 <none> Versions of packages linux-image-3.16.0-4-amd64 is related to: pn firmware-atheros <none> ii firmware-bnx2 0.43 ii firmware-bnx2x 0.43 pn firmware-brcm80211 <none> pn firmware-intelwimax <none> pn firmware-ipw2x00 <none> pn firmware-ivtv <none> pn firmware-iwlwifi <none> pn firmware-libertas <none> pn firmware-linux <none> pn firmware-linux-nonfree <none> pn firmware-myricom <none> pn firmware-netxen <none> pn firmware-qlogic <none> pn firmware-ralink <none> pn firmware-realtek <none> pn xen-hypervisor <none> -- debconf information: linux-image-3.16.0-4-amd64/postinst/mips-initrd-3.16.0-4-amd64: linux-image-3.16.0-4-amd64/prerm/removing-running-kernel-3.16.0-4-amd64: true linux-image-3.16.0-4-amd64/postinst/depmod-error-initrd-3.16.0-4-amd64: false