I'm attaching the crash tool output from the 3.13 kernel dump. Much likely related to the situation already found in the following case: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1413540
Handled by Chris Arges and I on LKML discussions with Ingo and Linus: -> http://www.kernelhub.org/?p=2&msg=683682 FOR NOW, it is LIKELY that I'll rely on already known recommendations for Proliant (including the ones related to X2APIC mode): https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1417580 So we can TRY TO GUARANTEE that there are no LOST IRQs (IPIs) using the firmware you're using. Hopefully with the proper APIC mode set, like HP recommends, we will not have those IPIs problems. OBS: Whenever IPIs are lost (we've seen this on some nested KVMs and some buggy HW) we can be locked up in the SMP callback state machine. This means that the state machine looses IPIs ACKs and the state machine loops forever trying to shutdown the CPU for the SMP task queue to continue. I'll provide SOON a comment with SUGGESTIONS and asking for FEEDBACK. ################ For now, from the 3.13 kernel dump, the most interesting part: We had 7 CPUs executing the migration kernel thread (for the SMP callback state machine execution): #### migration tasks (state machine loop) > 93 2 4 ffff8808147b47d0 RU 0.0 0 0 [migration/4] > 118 2 9 ffff881814a2c7d0 RU 0.0 0 0 [migration/9] > 123 2 10 ffff88081404c7d0 RU 0.0 0 0 [migration/10] > 128 2 11 ffff881814a4c7d0 RU 0.0 0 0 [migration/11] > 138 2 13 ffff881814a647d0 RU 0.0 0 0 [migration/13] > 165 2 18 ffff8810149ec7d0 RU 0.0 0 0 [migration/18] > 195 2 24 ffff881014a647d0 RU 0.0 0 0 [migration/24] This logic will try to migrate tasks from one CPU to another. In order for that to happen they have to rely on the state machine logic of shutting CPUs down before migrating the tasks (turning off IRQs, etc). The state machine - shutting down the CPUs on phases - relies on the SMP callbacks bellow. We had 3 CPUs in a part of the kernel that we have already identified to be problematic under certain conditions and/or HW. ** > 17247 1 23 ffff881007055fc0 RU 1.6 7358428 2192548 qemu- system-x86 PID: 17247 TASK: ffff881007055fc0 CPU: 23 COMMAND: "qemu-system-x86" #0 [ffff88203eac6e58] crash_nmi_callback at ffffffff8103fb72 #1 [ffff88203eac6e68] nmi_handle at ffffffff8171f188 #2 [ffff88203eac6ec8] do_nmi at ffffffff8171f350 #3 [ffff88203eac6ef0] end_repeat_nmi at ffffffff8171e5f1 [exception RIP: generic_exec_single+130] RIP: ffffffff810db712 RSP: ffff8810ea7c96e0 RFLAGS: 00000202 RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000202 RDX: ffff8810ea7c96e0 RSI: 0000000000000018 RDI: 0000000000000001 RBP: ffffffff810db712 R8: ffffffff810db712 R9: 0000000000000018 R10: ffff8810ea7c96e0 R11: 0000000000000202 R12: ffffffffffffffff R13: 0000000000000206 R14: 000000007bc87bc6 R15: ffff8814959f76c0 ORIG_RAX: ffff8814959f76c0 CS: 0010 SS: 0018 --- <NMI exception stack> --- #4 [ffff8810ea7c96e0] generic_exec_single at ffffffff810db712 !!!! CSD_FLAG logic discussed with Linus 108 while (csd->flags & CSD_FLAG_LOCK) 0xffffffff810db712 <+130>: testb $0x1,0x20(%rbx) 0xffffffff810db716 <+134>: jne 0xffffffff810db710 <generic_exec_single+128> 109 cpu_relax(); 110 } ** > 21036 1 27 ffff8810b69947d0 RU 1.0 7484828 1401940 qemu- system-x86 PID: 21036 TASK: ffff8810b69947d0 CPU: 27 COMMAND: "qemu-system-x86" #0 [ffff88203eb46e58] crash_nmi_callback at ffffffff8103fb72 #1 [ffff88203eb46e68] nmi_handle at ffffffff8171f188 #2 [ffff88203eb46ec8] do_nmi at ffffffff8171f350 #3 [ffff88203eb46ef0] end_repeat_nmi at ffffffff8171e5f1 [exception RIP: generic_exec_single+130] RIP: ffffffff810db712 RSP: ffff8814959f7670 RFLAGS: 00000202 RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000202 RDX: ffff8814959f7670 RSI: 0000000000000018 RDI: 0000000000000001 RBP: ffffffff810db712 R8: ffffffff810db712 R9: 0000000000000018 R10: ffff8814959f7670 R11: 0000000000000202 R12: ffffffffffffffff R13: 0000000000000282 R14: 0000000000000000 R15: 0000000000000100 ORIG_RAX: 0000000000000100 CS: 0010 SS: 0018 --- <NMI exception stack> --- #4 [ffff8814959f7670] generic_exec_single at ffffffff810db712 !!!! CSD_FLAG logic discussed with Linus 108 while (csd->flags & CSD_FLAG_LOCK) 0xffffffff810db712 <+130>: testb $0x1,0x20(%rbx) 0xffffffff810db716 <+134>: jne 0xffffffff810db710 <generic_exec_single+128> 109 cpu_relax(); 110 } ** > 18516 1 31 ffff881dd54a2fe0 RU 1.6 7358428 2192548 qemu- system-x86 PID: 18516 TASK: ffff881dd54a2fe0 CPU: 31 COMMAND: "qemu-system-x86" #0 [ffff88203ebc6e58] crash_nmi_callback at ffffffff8103fb72 #1 [ffff88203ebc6e68] nmi_handle at ffffffff8171f188 #2 [ffff88203ebc6ec8] do_nmi at ffffffff8171f350 #3 [ffff88203ebc6ef0] end_repeat_nmi at ffffffff8171e5f1 [exception RIP: generic_exec_single+130] RIP: ffffffff810db712 RSP: ffff881dd55597a0 RFLAGS: 00000202 RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000202 RDX: ffff881dd55597a0 RSI: 0000000000000018 RDI: 0000000000000001 RBP: ffffffff810db712 R8: ffffffff810db712 R9: 0000000000000018 R10: ffff881dd55597a0 R11: 0000000000000202 R12: ffffffffffffffff R13: 0000000000000206 R14: 000000007bca7bc8 R15: ffff8814959f76c0 ORIG_RAX: ffff8814959f76c0 CS: 0010 SS: 0018 --- <NMI exception stack> --- #4 [ffff881dd55597a0] generic_exec_single at ffffffff810db712 !!!! CSD_FLAG logic discussed with Linus 108 while (csd->flags & CSD_FLAG_LOCK) 0xffffffff810db712 <+130>: testb $0x1,0x20(%rbx) 0xffffffff810db716 <+134>: jne 0xffffffff810db710 <generic_exec_single+128> 109 cpu_relax(); 110 } ** Attachment added: "lp1505564-3.13-kdump-crash-output.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1505564/+attachment/4509470/+files/lp1505564-3.13-kdump-crash-output.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1505564 Title: Soft lockup with "block nbdX: Attempted send on closed socket" spam Status in linux package in Ubuntu: Confirmed Bug description: Hi, Some of our nova compute hosts regularly freeze, sometimes for a few hours, with kern.log getting spammed with : block nbdX: Attempted send on closed socket and a few "CPU soft lockup" messages (see attached log). This clears up when the queue gets cleared, eg : block nbdX: queue cleared trusty hosts with kernel version 3.19.0-30-generic. Note that timestamps from kern.log appears to be wrong, it looks like the messages are being held, and then all delivered at once when the kernel unfreezes. Attaching apport files from 2 hosts below. --- AlsaDevices: total 0 crw-rw---- 1 root audio 116, 1 Oct 9 13:14 seq crw-rw---- 1 root audio 116, 33 Oct 9 13:14 timer AplayDevices: Error: [Errno 2] No such file or directory ApportVersion: 2.14.1-0ubuntu3.15 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: DistroRelease: Ubuntu 14.04 IwConfig: Error: [Errno 2] No such file or directory MachineType: HP ProLiant DL385 G7 Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=screen-256color PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 radeondrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.19.0-30-generic root=UUID=1bd039df-c419-4cb7-b1ad-fe004d55ccd4 ro console=tty0 console=ttyS1,38400 nosplash ProcVersionSignature: Ubuntu 3.19.0-30.34~14.04.1-generic 3.19.8-ckt6 RelatedPackageVersions: linux-restricted-modules-3.19.0-30-generic N/A linux-backports-modules-3.19.0-30-generic N/A linux-firmware 1.127.15 RfKill: Error: [Errno 2] No such file or directory Tags: trusty uec-images Uname: Linux 3.19.0-30-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True dmi.bios.date: 12/08/2012 dmi.bios.vendor: HP dmi.bios.version: A18 dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.modalias: dmi:bvnHP:bvrA18:bd12/08/2012:svnHP:pnProLiantDL385G7:pvr:cvnHP:ct23:cvr: dmi.product.name: ProLiant DL385 G7 dmi.sys.vendor: HP --- AlsaDevices: total 0 crw-rw---- 1 root audio 116, 1 Oct 9 12:37 seq crw-rw---- 1 root audio 116, 33 Oct 9 12:37 timer AplayDevices: Error: [Errno 2] No such file or directory ApportVersion: 2.14.1-0ubuntu3.15 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: DistroRelease: Ubuntu 14.04 IwConfig: Error: [Errno 2] No such file or directory MachineType: HP ProLiant DL360p Gen8 Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=screen-256color PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 VESA VGA ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.19.0-30-generic root=UUID=f3c6cae8-09dc-4607-8675-c9123ea9c9fd ro console=tty0 console=ttyS1,38400 nosplash ProcVersionSignature: Ubuntu 3.19.0-30.34~14.04.1-generic 3.19.8-ckt6 RelatedPackageVersions: linux-restricted-modules-3.19.0-30-generic N/A linux-backports-modules-3.19.0-30-generic N/A linux-firmware 1.127.15 RfKill: Error: [Errno 2] No such file or directory StagingDrivers: visorchannel visorutil Tags: trusty uec-images staging Uname: Linux 3.19.0-30-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True dmi.bios.date: 11/14/2013 dmi.bios.vendor: HP dmi.bios.version: P71 dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.modalias: dmi:bvnHP:bvrP71:bd11/14/2013:svnHP:pnProLiantDL360pGen8:pvr:cvnHP:ct23:cvr: dmi.product.name: ProLiant DL360p Gen8 dmi.sys.vendor: HP --- AlsaDevices: total 0 crw-rw---- 1 root audio 116, 1 Oct 28 13:07 seq crw-rw---- 1 root audio 116, 33 Oct 28 13:07 timer AplayDevices: Error: [Errno 2] No such file or directory ApportVersion: 2.14.1-0ubuntu3.16 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: DistroRelease: Ubuntu 14.04 IwConfig: Error: [Errno 2] No such file or directory MachineType: HP ProLiant DL360p Gen8 Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=screen-256color PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 VESA VGA ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-29-generic root=UUID=46ac2c1e-5f16-45bd-b383-e952f78fd142 ro console=tty0 console=ttyS1,38400 nosplash crashkernel=384M-:512M ProcVersionSignature: Ubuntu 3.13.0-29.53-generic 3.13.11.2 RelatedPackageVersions: linux-restricted-modules-3.13.0-29-generic N/A linux-backports-modules-3.13.0-29-generic N/A linux-firmware 1.127.15 RfKill: Error: [Errno 2] No such file or directory Tags: trusty uec-images Uname: Linux 3.13.0-29-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True dmi.bios.date: 11/14/2013 dmi.bios.vendor: HP dmi.bios.version: P71 dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.modalias: dmi:bvnHP:bvrP71:bd11/14/2013:svnHP:pnProLiantDL360pGen8:pvr:cvnHP:ct23:cvr: dmi.product.name: ProLiant DL360p Gen8 dmi.sys.vendor: HP --- AlsaDevices: total 0 crw-rw---- 1 root audio 116, 1 Oct 28 16:31 seq crw-rw---- 1 root audio 116, 33 Oct 28 16:31 timer AplayDevices: Error: [Errno 2] No such file or directory ApportVersion: 2.14.1-0ubuntu3.18 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: DistroRelease: Ubuntu 14.04 IwConfig: Error: [Errno 2] No such file or directory MachineType: HP ProLiant DL385 G7 Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=screen-256color PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 radeondrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=29988d72-90fe-4329-a0c1-0e3bfb88beab ro console=tty0 console=ttyS1,38400 nosplash crashkernel=384M-:512M ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 RelatedPackageVersions: linux-restricted-modules-3.13.0-24-generic N/A linux-backports-modules-3.13.0-24-generic N/A linux-firmware 1.127.15 RfKill: Error: [Errno 2] No such file or directory Tags: trusty uec-images Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: WifiSyslog: Oct 29 07:58:52 druk kernel: [55713.346292] nr_pdflush_threads exported in /proc is scheduled for removal Oct 29 07:58:52 druk kernel: [55713.346523] sysctl: The scan_unevictable_pages sysctl/node-interface has been disabled for lack of a legitimate use case. If you have one, please send an email to linux...@kvack.org. _MarkForUpload: True dmi.bios.date: 02/02/2014 dmi.bios.vendor: HP dmi.bios.version: A18 dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.modalias: dmi:bvnHP:bvrA18:bd02/02/2014:svnHP:pnProLiantDL385G7:pvr:cvnHP:ct23:cvr: dmi.product.name: ProLiant DL385 G7 dmi.sys.vendor: HP --- AlsaDevices: total 0 crw-rw---- 1 root audio 116, 1 Oct 28 19:56 seq crw-rw---- 1 root audio 116, 33 Oct 28 19:56 timer AplayDevices: Error: [Errno 2] No such file or directory ApportVersion: 2.14.1-0ubuntu3.18 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: DistroRelease: Ubuntu 14.04 IwConfig: Error: [Errno 2] No such file or directory MachineType: HP ProLiant DL360p Gen8 Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=screen-256color PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 VESA VGA ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-29-generic root=UUID=f3c6cae8-09dc-4607-8675-c9123ea9c9fd ro console=tty0 console=ttyS1,38400 nosplash crashkernel=384M-:512M ProcVersionSignature: Ubuntu 3.13.0-29.53-generic 3.13.11.2 RelatedPackageVersions: linux-restricted-modules-3.13.0-29-generic N/A linux-backports-modules-3.13.0-29-generic N/A linux-firmware 1.127.15 RfKill: Error: [Errno 2] No such file or directory Tags: trusty uec-images Uname: Linux 3.13.0-29-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: WifiSyslog: Oct 29 07:41:34 orlo kernel: [42337.380135] qbrce3298b1-cb: port 2(tapce3298b1-cb) entered disabled state Oct 29 07:41:34 orlo kernel: [42337.380543] device tapce3298b1-cb left promiscuous mode Oct 29 07:41:34 orlo kernel: [42337.380580] qbrce3298b1-cb: port 2(tapce3298b1-cb) entered disabled state Oct 29 07:41:35 orlo kernel: [42338.223036] type=1400 audit(1446104495.424:137): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="libvirt-80a3c754-9f62-4011-baea-b3a8f37d3746" pid=11554 comm="apparmor_parser" Oct 29 07:41:35 orlo kernel: [42338.308192] qbrce3298b1-cb: port 1(qvbce3298b1-cb) entered disabled state _MarkForUpload: True dmi.bios.date: 11/14/2013 dmi.bios.vendor: HP dmi.bios.version: P71 dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.modalias: dmi:bvnHP:bvrP71:bd11/14/2013:svnHP:pnProLiantDL360pGen8:pvr:cvnHP:ct23:cvr: dmi.product.name: ProLiant DL360p Gen8 dmi.sys.vendor: HP --- AlsaDevices: total 0 crw-rw---- 1 root audio 116, 1 Oct 29 21:10 seq crw-rw---- 1 root audio 116, 33 Oct 29 21:10 timer AplayDevices: Error: [Errno 2] No such file or directory ApportVersion: 2.14.1-0ubuntu3.18 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: DistroRelease: Ubuntu 14.04 IwConfig: Error: [Errno 2] No such file or directory MachineType: HP ProLiant DL360p Gen8 Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=screen-256color PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 VESA VGA ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.19.0-31-generic root=UUID=f3c6cae8-09dc-4607-8675-c9123ea9c9fd ro console=tty0 console=ttyS1,38400 nosplash crashkernel=384M-:512M ProcVersionSignature: Ubuntu 3.19.0-31.36~14.04.1-generic 3.19.8-ckt7 RelatedPackageVersions: linux-restricted-modules-3.19.0-31-generic N/A linux-backports-modules-3.19.0-31-generic N/A linux-firmware 1.127.16 RfKill: Error: [Errno 2] No such file or directory StagingDrivers: visorchannel visorutil Tags: trusty uec-images staging Uname: Linux 3.19.0-31-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True dmi.bios.date: 11/14/2013 dmi.bios.vendor: HP dmi.bios.version: P71 dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.modalias: dmi:bvnHP:bvrP71:bd11/14/2013:svnHP:pnProLiantDL360pGen8:pvr:cvnHP:ct23:cvr: dmi.product.name: ProLiant DL360p Gen8 dmi.sys.vendor: HP --- AlsaDevices: total 0 crw-rw---- 1 root audio 116, 1 Oct 30 12:31 seq crw-rw---- 1 root audio 116, 33 Oct 30 12:31 timer AplayDevices: Error: [Errno 2] No such file or directory ApportVersion: 2.14.1-0ubuntu3.18 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: DistroRelease: Ubuntu 14.04 IwConfig: Error: [Errno 2] No such file or directory MachineType: HP ProLiant DL360p Gen8 Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=screen-256color PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 VESA VGA ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.19.0-31-generic root=UUID=46ac2c1e-5f16-45bd-b383-e952f78fd142 ro console=tty0 console=ttyS1,38400 nosplash crashkernel=384M-:512M ProcVersionSignature: Ubuntu 3.19.0-31.36~14.04.1-generic 3.19.8-ckt7 RelatedPackageVersions: linux-restricted-modules-3.19.0-31-generic N/A linux-backports-modules-3.19.0-31-generic N/A linux-firmware 1.127.16 RfKill: Error: [Errno 2] No such file or directory StagingDrivers: visorchannel visorutil Tags: trusty uec-images staging Uname: Linux 3.19.0-31-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True dmi.bios.date: 11/14/2013 dmi.bios.vendor: HP dmi.bios.version: P71 dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.modalias: dmi:bvnHP:bvrP71:bd11/14/2013:svnHP:pnProLiantDL360pGen8:pvr:cvnHP:ct23:cvr: dmi.product.name: ProLiant DL360p Gen8 dmi.sys.vendor: HP To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1505564/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp