[Bug 1813371] [NEW] OVS 2.9+ systemd integration issues
Public bug reported: For a few months now, we have been using OVS 2.9 (or newer) on Ubuntu Xenial in OPNFV, both with and without DPDK. A while ago, we observed a couple of rare race conditions when multiple Linux interfaces/bridges are mixed with OVS ports/bridges. We also observed races between DPDK binding and openvswitch-switch (actually openvswitch-switch-dpdk configured using alternatives). We worked around those issues by using a solution derived from the official OVS Debian readme, which recommends avoiding using `auto` for OVS bridges. Instead, we used `auto` for OVS bridges, but omitted the `auto` for the OVS ports in them. That worked almost perfectly for a while. However, we recently bumped a few unrelated software components (since we migrated from Queens to Rocky in OPNFV) and we started experiecing race conditions again. So I dugg a bit and found a couple of things: 1. Broken dependency between ovsdb-server/ovs-vswitchd systemd services and networking.service This is probably a copy-pasta error from [1] `Before: network.service` which should probably be `Before: networking.service` on Debian systems. The consequence is quite serious - on Debian systems, the OVS services start *after* networking.service. Changing this leads to a service order change, which turns out to be quite the rabbit hole ... 2. Outdated ifupdown scripts For example /etc/network/if-pre-up.d/openvswitch still references the old `openvswitch-nonetwork.service`. Luckily, this is not critical, as the fallback uses `service openvswitch-switch [...]`, so I'm not sure this should be changed, but I thought it's worth mentioning. 3. Debian OVS does *not* handle OVS bridges without `auto` Upstream OVS readme recommends ommitting `auto` for OVS bridges, as mentioned earlier, to avoid exactly the race conditions we saw. Although following the recommendation in the upstream readme leads to a working system (`networking.service` no longer fails to start due to missing OVS bridges and/or vice-versa - ovs services no longer complain about Linux interfaces being in down state when trying to add them to OVS bridges), OVS bridges end up in DOWN state since nobody bothers to ifup them. Imo, networking.service (or some *other* mechanism) should call `/sbin/ifup --allow=ovs -a --read-environment` *after* the initial `/sbin/ifup -a --read-enviroment` (provided the ordering issue #1 was changed to start OVS first, of course). 4. ovsdb-server should never start before DPDK service if DPDK is installed This should actually be easy to fix and I have to admit I haven't run into it lately, although I remember it being an issue a while ago. Anyway, a simple `After: dpdk.service` wouldn't hurt. 5. If OVS starts before networking.service, cloud-init causes cyclic dependencies If we configure OVS services to start first, systemd might decide to randomly remove some units to break the following circular dependency: ovs-vswitchd --> ovsdb-server -(default dep)-> sysinit.target --> cloud-init.service --> networking.service --> ovs-vswitchd In my tests, I just set 'DefaultDependencies=no' for OVS services, although this might require explicitly adding back some of the indirect dependencies of `sysinit.target`, so it's a sensible recommendation. On my test systems, I didn't bother handling #2, as for the others I have some systemd drop-ins (see below), which so far seem to produce reproductible working environments. # cat /etc/systemd/system/ovsdb-server.service.d/override.conf [Unit] After=dpdk.service Before=networking.service DefaultDependencies=no # cat /etc/systemd/system/networking.service.d/ovs_workaround.conf [Service] ExecStart=/sbin/ifup --allow=ovs -a --read-environment # cat /etc/systemd/system/ovs-vswitchd.service.d/override.conf [Unit] Before=networking.service DefaultDependencies=no # lsb_release -rd Description:Ubuntu 16.04.5 LTS Release:16.04 # apt-cache policy openvswitch-switch openvswitch-switch: Installed: 2.9.0-0ubuntu1~cloud0 Candidate: 2.9.0-0ubuntu1~cloud0 Version table: *** 2.9.0-0ubuntu1~cloud0 500 500 http://ubuntu-cloud.archive.canonical.com/ubuntu xenial-updates/queens/main amd64 Packages 100 /var/lib/dpkg/status [1] https://github.com/openvswitch/ovs/blob/master/rhel /usr_lib_systemd_system_ovsdb-server.service#L4 ** Affects: openvswitch (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1813371 Title: OVS 2.9+ systemd integration issues To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1813371/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1857074] Re: Cavium ThunderX CN88XX Panic : Unknown reason
Hi, Not sure this is useful (since it might be obvious), but adding `nopti` to kernel parameters works around the issue, indicating this is indeed related to kpti. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1857074 Title: Cavium ThunderX CN88XX Panic : Unknown reason To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1857074/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1582181] Re: AArch64: slow cpuinfo due to redundant loop
Upstream PR merged. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1582181 Title: AArch64: slow cpuinfo due to redundant loop To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lshw/+bug/1582181/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1797332] Re: qemu nested virtualization is not working with Ubuntu16.04 + Intel CPU
FWIW, bumping the kernel on the host (and most likely on the L1 VMs too) should work. The HWE kernel in Xenial is the same version (4.15) with the kernel used by Bionic (18.04), so this should fix the problem: $ apt install linux-generic-hwe-16.04 $ reboot BR, Alex -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1797332 Title: qemu nested virtualization is not working with Ubuntu16.04 + Intel CPU To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1797332/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1582181] Re: AArch64: slow cpuinfo due to redundant loop
I see nobody acted on this, so I sent a PR [1] upstream. Will update this ticket if it gets pulled. [1] https://github.com/lyonel/lshw/pull/36 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1582181 Title: AArch64: slow cpuinfo due to redundant loop To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lshw/+bug/1582181/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1672521] [NEW] ThunderX: soft lockup on 4.8+ kernels
Public bug reported: I have been trying to easily reproduce this for days. We initially observed it in OPNFV Armband, when we tried to upgrade our Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8). In our environment, this was easily triggered on compute nodes, when launching multiple VMs (we suspected OVS, QEMU etc.). However, in order to rule out our specifics, we looked for a simple way to reproduce it on all ThunderX nodes we have access to, and we finally found it: $ apt-get install stress-ng $ stress-ng --hdd 1024 We tested different FW versions, provided by both chip/board manufacturers, and with all of them the result is 100% reproductible, leading to a kernel Oops [1]: [ 726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds. [ 726.077908] Tainted: GW I 4.8.0-41-generic #44~16.04.1-Ubuntu [ 726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 726.094383] kworker/0:1 D 080861bc 0 312 2 0x [ 726.094401] Workqueue: events vmstat_shepherd [ 726.094404] Call trace: [ 726.094411] [] __switch_to+0x94/0xa8 [ 726.094418] [] __schedule+0x224/0x718 [ 726.094421] [] schedule+0x38/0x98 [ 726.094425] [] schedule_preempt_disabled+0x14/0x20 [ 726.094428] [] __mutex_lock_slowpath+0xd4/0x168 [ 726.094431] [] mutex_lock+0x58/0x70 [ 726.094437] [] get_online_cpus+0x44/0x70 [ 726.094440] [] vmstat_shepherd+0x3c/0xe8 [ 726.094446] [] process_one_work+0x150/0x478 [ 726.094449] [] worker_thread+0x50/0x4b8 [ 726.094453] [] kthread+0xec/0x100 [ 726.094456] [] ret_from_fork+0x10/0x40 Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the soft lockup happens with each and every one of them. On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably newer 4.4.0-* too, but due to a regression in the ethernet drivers after 4.4.0-45, we can't test those with ease) under normal conditions, yet running stress-ng leads to the same oops. [1] http://paste.ubuntu.com/24172516/ ** Affects: linux (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1672521] CurrentDmesg.txt
apport information ** Attachment added: "CurrentDmesg.txt" https://bugs.launchpad.net/bugs/1672521/+attachment/4837213/+files/CurrentDmesg.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1672521] JournalErrors.txt
apport information ** Attachment added: "JournalErrors.txt" https://bugs.launchpad.net/bugs/1672521/+attachment/4837214/+files/JournalErrors.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1672521] Re: ThunderX: soft lockup on 4.8+ kernels
apport information ** Tags added: apport-collected xenial ** Description changed: I have been trying to easily reproduce this for days. We initially observed it in OPNFV Armband, when we tried to upgrade our Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8). In our environment, this was easily triggered on compute nodes, when launching multiple VMs (we suspected OVS, QEMU etc.). However, in order to rule out our specifics, we looked for a simple way to reproduce it on all ThunderX nodes we have access to, and we finally found it: $ apt-get install stress-ng $ stress-ng --hdd 1024 We tested different FW versions, provided by both chip/board manufacturers, and with all of them the result is 100% reproductible, leading to a kernel Oops [1]: [ 726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds. [ 726.077908] Tainted: GW I 4.8.0-41-generic #44~16.04.1-Ubuntu [ 726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 726.094383] kworker/0:1 D 080861bc 0 312 2 0x [ 726.094401] Workqueue: events vmstat_shepherd [ 726.094404] Call trace: [ 726.094411] [] __switch_to+0x94/0xa8 [ 726.094418] [] __schedule+0x224/0x718 [ 726.094421] [] schedule+0x38/0x98 [ 726.094425] [] schedule_preempt_disabled+0x14/0x20 [ 726.094428] [] __mutex_lock_slowpath+0xd4/0x168 [ 726.094431] [] mutex_lock+0x58/0x70 [ 726.094437] [] get_online_cpus+0x44/0x70 [ 726.094440] [] vmstat_shepherd+0x3c/0xe8 [ 726.094446] [] process_one_work+0x150/0x478 [ 726.094449] [] worker_thread+0x50/0x4b8 [ 726.094453] [] kthread+0xec/0x100 [ 726.094456] [] ret_from_fork+0x10/0x40 Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the soft lockup happens with each and every one of them. On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably newer 4.4.0-* too, but due to a regression in the ethernet drivers after 4.4.0-45, we can't test those with ease) under normal conditions, yet running stress-ng leads to the same oops. [1] http://paste.ubuntu.com/24172516/ + --- + AlsaDevices: + total 0 + crw-rw 1 root audio 116, 1 Mar 13 19:27 seq + crw-rw 1 root audio 116, 33 Mar 13 19:27 timer + AplayDevices: Error: [Errno 2] No such file or directory + ApportVersion: 2.20.1-0ubuntu2.5 + Architecture: arm64 + ArecordDevices: Error: [Errno 2] No such file or directory + AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: + DistroRelease: Ubuntu 16.04 + IwConfig: Error: [Errno 2] No such file or directory + MachineType: GIGABYTE R120-T30 + Package: linux (not installed) + PciMultimedia: + + ProcEnviron: + TERM=vt220 + PATH=(custom, no user) + XDG_RUNTIME_DIR= + LANG=en_US.UTF-8 + SHELL=/bin/bash + ProcFB: 0 astdrmfb + ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet splash vt.handoff=7 + ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17 + RelatedPackageVersions: + linux-restricted-modules-4.8.0-41-generic N/A + linux-backports-modules-4.8.0-41-generic N/A + linux-firmware1.157.8 + RfKill: Error: [Errno 2] No such file or directory + Tags: xenial + Uname: Linux 4.8.0-41-generic aarch64 + UpgradeStatus: No upgrade log present (probably fresh install) + UserGroups: + + _MarkForUpload: True + dmi.bios.date: 11/22/2016 + dmi.bios.vendor: GIGABYTE + dmi.bios.version: T22 + dmi.board.asset.tag: 01234567890123456789AB + dmi.board.name: MT30-GS0 + dmi.board.vendor: GIGABYTE + dmi.board.version: 01234567 + dmi.chassis.asset.tag: 01234567890123456789AB + dmi.chassis.type: 17 + dmi.chassis.vendor: GIGABYTE + dmi.chassis.version: 01234567 + dmi.modalias: dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567: + dmi.product.name: R120-T30 + dmi.product.version: 0100 + dmi.sys.vendor: GIGABYTE ** Attachment added: "CRDA.txt" https://bugs.launchpad.net/bugs/1672521/+attachment/4837212/+files/CRDA.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1672521] WifiSyslog.txt
apport information ** Attachment added: "WifiSyslog.txt" https://bugs.launchpad.net/bugs/1672521/+attachment/4837221/+files/WifiSyslog.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1672521] ProcCpuinfo.txt
apport information ** Attachment added: "ProcCpuinfo.txt" https://bugs.launchpad.net/bugs/1672521/+attachment/4837217/+files/ProcCpuinfo.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1672521] Lspci.txt
apport information ** Attachment added: "Lspci.txt" https://bugs.launchpad.net/bugs/1672521/+attachment/4837215/+files/Lspci.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1672521] UdevDb.txt
apport information ** Attachment added: "UdevDb.txt" https://bugs.launchpad.net/bugs/1672521/+attachment/4837220/+files/UdevDb.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1672521] Lsusb.txt
apport information ** Attachment added: "Lsusb.txt" https://bugs.launchpad.net/bugs/1672521/+attachment/4837216/+files/Lsusb.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1672521] ProcModules.txt
apport information ** Attachment added: "ProcModules.txt" https://bugs.launchpad.net/bugs/1672521/+attachment/4837219/+files/ProcModules.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1672521] ProcInterrupts.txt
apport information ** Attachment added: "ProcInterrupts.txt" https://bugs.launchpad.net/bugs/1672521/+attachment/4837218/+files/ProcInterrupts.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1672521] Re: ThunderX: soft lockup on 4.8+ kernels
Hi, I tried out 4.11-rc1 a few days ago. Unfortunately, I did not get the board to boot properly from the start, since ThunderX networking drivers failed to allocate MSI-X/MSI interrupts, and polling on some registers also failed ... So, with 4.11-rc1, at least one networking interfaces was never coming online due to unmapped interrupts/failed polling, but unloading `nicpf` and reloading it seemed to work (networking worked after this). After this, the soft lockup happened, but I can't be sure I did not mess something else. Let me try this again and get back to you with some proper logs, but off the top of my head, things got worse with 4.11-rc1 ... Thanks, Alex -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1672521] Re: ThunderX: soft lockup on 4.8+ kernels
4.11-rc1 console log attached. Board firmware is latest available on Gigabyte's site (T31). 1. Install 4.11-rc1 (`make modules_install install`) and reboot 2. Observe networking driver issues in boot log Dmesg: 4.11-rc1_dmesg_on_clean_boot.log [3] 3. Try `ping google.com`, obviously not working 4. `modprobe -r nicpf` (leads to multiple oopses in dmesg) Console log: 4.11-rc1_modprobe_r_nicpf_output.log [1] Dmesg :4.11-rc1_dmesg_after_modprobe_r_nicpf.log [2] 5. `modprobe nicpf` (this usually works, and afterwards network is up and running - not sure whether ALL interfaces are ok, as not all of them are connected) - however this time it led to a soft lockup (see full logs attached here); [1] http://paste.ubuntu.com/24178311/ [2] http://paste.ubuntu.com/24178312/ [3] http://paste.ubuntu.com/24178313/ ** Attachment added: "ThunderX 4.11-rc1 console log" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+attachment/4837770/+files/thunderx_4.11_rc1_console_log.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1630038] Re: thunder nic: avoid link delays due to RX_PACKET_DIS
Hi, This fix introduced a regression with ThunderX nodes (CRB-1S, CRB-2S) and our 10G switch (Extreme Networks x670 10GE L3). We have opened a downstream bug report [1], where we temporarily bypassed this by pinning the kernel to 4.4.0-45. I also tested 4.8 (multiple builds), 4.10 and 4.11-rc1 (vanilla); all are still affected by link training issues with our switch, with 4.11-rc1 not working at all and reporting more issues (logs attached in a different LP comment [2]). BR, Alex [1] https://jira.opnfv.org/browse/ARMBAND-168 [2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/comments/17 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1630038 Title: thunder nic: avoid link delays due to RX_PACKET_DIS To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1630038/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1630038] Re: thunder nic: avoid link delays due to RX_PACKET_DIS
Hi, 1) We tested different models (CRB-1S, CRB-2S) - all behave the same. 2) Please check the logs "ThunderX 4.11-rc1 console log" in [2] linked above. I don't think firmware version makes a difference for this issue (we saw the same bug with firmwares: T22, T27, T31). All in all, this issue seems pretty tied to the switch we use, and all firmware/board model combinations behaved the same ... Thanks, Alex -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1630038 Title: thunder nic: avoid link delays due to RX_PACKET_DIS To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1630038/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1672521] Re: ThunderX: soft lockup on 4.8+ kernels
Hi, Dann, First of all, I think the bug title is misleading, as this issue happens on all kernels we tested (4.4.0-45..66, 4.8.0-x, 4.10.0-x etc). To be fair, we haven't this exact bug (or at least I don't think we did) in practice, i.e. without running stress-ng, 4.4.0-x never ever crashed. The VM use case turned out to be a different bug [1], triggered 100% by AAVMF + vhost. Let me know if I can provide anything else. I consider this particular bug minor (if we don't poke it with stress-ng, everything works well), compared to AAVMF + vhost [1]. Thanks, Alex [1] https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1673564 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1674837] Re: thunder nic: RX_PACKET_DIS fix regression with Extreme Networks switch
Let me know if I should attach any logs, although there are *no* traces anywhere, at least with default log levels (without recompiling). -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1674837 Title: thunder nic: RX_PACKET_DIS fix regression with Extreme Networks switch To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1674837/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1674837] [NEW] thunder nic: RX_PACKET_DIS fix regression with Extreme Networks switch
Public bug reported: Upstream backport [3] introduced a regression with ThunderX nodes (CRB-1S, CRB-2S) and our 10G switch (Extreme Networks x670 10GE L3). We have opened a downstream bug report [1], where we temporarily bypassed this by pinning the kernel to 4.4.0-45. I also tested 4.8 (multiple builds), 4.10 and 4.11-rc1 (vanilla); all are still affected by link training issues with our switch, with 4.11-rc1 not working at all and reporting more issues (logs attached in a different LP comment [2]). I also confirmed that reverting the commit in questions fixes the issues in our setup (tested on top of 4.10.0-13 linux-image-generic-hwe-edge package from Xenial). BR, Alex [1] https://jira.opnfv.org/browse/ARMBAND-168 [2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/comments/17 [3] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1630038 ** Affects: linux (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1674837 Title: thunder nic: RX_PACKET_DIS fix regression with Extreme Networks switch To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1674837/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1630038] Re: thunder nic: avoid link delays due to RX_PACKET_DIS
Hi, Dann, I created a new bug and pasted the same info as above at [1]. Afaict, there is no useful information in the logs when link training fails. [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1674837 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1630038 Title: thunder nic: avoid link delays due to RX_PACKET_DIS To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1630038/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1674837] Re: thunder nic: RX_PACKET_DIS fix regression with Extreme Networks switch
** Changed in: linux (Ubuntu) Status: Incomplete => Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1674837 Title: thunder nic: RX_PACKET_DIS fix regression with Extreme Networks switch To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1674837/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1582181] Re: AArch64: slow cpuinfo due to redundant loop
Hi, If it helps, we have an old DEB package at [1]. I think it's based on the lshw version that was used by Trusty or Xenial at that time. [1] http://linux.enea.com/mos-repos/ubuntu/10.0/pool/main/l/lshw/ -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1582181 Title: AArch64: slow cpuinfo due to redundant loop To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lshw/+bug/1582181/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1673564] Re: ThunderX: soft lockup on 4.8+ kernels when running qemu-efi with vhost=on
Hi, Dann, Thanks for looking into this! One more thing: we blacklisted the module "vhost_net", and that bypasses the issue. I know it's not the right direction for finding a fix, but maybe it helps with the debug. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1673564 Title: ThunderX: soft lockup on 4.8+ kernels when running qemu-efi with vhost=on To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1673564/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1582181] [NEW] AArch64: slow cpuinfo due to redundant loop
Public bug reported: lshw on AArch64 hardware is painfully slow. This affects both lshw in current Ubuntu releases and vanilla upstream. For a 48 core node, cpuinfo parsing added up to 30 seconds (8 lines per core in /proc/cpuinfo add up to 384 lines to parse). For a 96 core node, parsing took up to 5 minutes (!). I think the problem was introduced by [1], and can be summarized as: - CPU capabilities should be added only to the current CPU core, and NOT to all previous CPU cores parsed. My suggestion is dropping the loop in [1], thus calling the and only for currentcpu. I put together a small patch (basically removing the for loop in question) at [2] (or see attachement), which should be applied on top of version "02.16-2ubuntu1.3" from Ubuntu Trusty 14.04. After applying the patch at [2], parsing for the above system (48 cores) takes less than 1 second (instead of 30s), with the exact same results ... [1] https://github.com/lyonel/lshw/commit/beb89de5a3c10449fe73f1c77b2486d868e5bc9a #diff-f4010714738fa4cdd5999499579da2b3R217 [2] http://paste.ubuntu.com/16456620/ # lsb_release -rd Description:Ubuntu 14.04.4 LTS Release:14.04 BR, Alex ** Affects: lshw (Ubuntu) Importance: Undecided Status: New ** Patch added: "AArch64-cpuinfo-Remove-redundant-cpu-caps-loop.patch" https://bugs.launchpad.net/bugs/1582181/+attachment/4663771/+files/AArch64-cpuinfo-Remove-redundant-cpu-caps-loop.patch -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1582181 Title: AArch64: slow cpuinfo due to redundant loop To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lshw/+bug/1582181/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs