I think both problems have the same root cause, whatever that is. Feb 19 22:21:12 stefan-testhead kernel: wls1f0: Limiting TX power to 30 (30 - 0) dBm as advertised by d6:24:dd:03:ec:30 Feb 19 23:16:23 stefan-testhead kernel: ------------[ cut here ]------------ Feb 19 23:16:23 stefan-testhead kernel: WARNING: CPU: 2 PID: 4971 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:929 iwl_mvm_tx_tso_segment+0x372/0x390 [iwlmvm] Feb 19 23:16:23 stefan-testhead kernel: Modules linked in: cmac ipvtap ipvlan ccm vhost_net vhost vhost_iotlb tap xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables n fnetlink bridge stp llc wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel binfmt_misc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common sb_edac x86_pkg_t emp_thermal intel_powerclamp coretemp iwlmvm mac80211 kvm_intel libarc4 kvm btusb btrtl irqbypass nls_iso8859_1 btintel btbcm iwlwifi btmtk rapl ipmi_ssif intel_cstate cmdlinepart bluetooth spi_nor ecdh_generic mei_me mtd intel_pch_thermal input_leds joydev ecc cfg80211 mei ioatdma acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler mac_hid acpi_pad sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic dm_crypt raid10 raid456 Feb 19 23:16:23 stefan-testhead kernel: async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 hid_generic usbhid hid cdc_ncm cdc_ether usbnet r8152 mii crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 spi_intel_platform spi_intel gpio_ich mxm_wmi sha1_ssse3 nvme ahci i2c_i801 ixgbe igb libahci lpc_ich ast i2c_smbus xhci_pci nvme_core xfrm_algo i2c_algo_bit xhci_pci_renesas dca mdio nvme_auth wmi aesni_intel crypto_simd cryptd Feb 19 23:16:23 stefan-testhead kernel: CPU: 2 PID: 4971 Comm: vhost-4734 Not tainted 6.8.0-51-generic #52~22.04.1-Ubuntu Feb 19 23:16:23 stefan-testhead kernel: Hardware name: Supermicro SYS-E300-8D/X10SDV-TP8F, BIOS 2.3 05/07/2021 Feb 19 23:16:23 stefan-testhead kernel: RIP: 0010:iwl_mvm_tx_tso_segment+0x372/0x390 [iwlmvm] Feb 19 23:16:23 stefan-testhead kernel: Code: ec 49 8b 97 c8 00 00 00 44 8b 45 98 41 b9 01 00 00 00 48 89 c3 41 8b 87 c0 00 00 00 8b 4d 94 66 44 89 44 02 04 e9 66 fd ff ff <0f> 0b b8 ea ff ff ff e9 86 fe ff ff e8 cd d9 78 ec 66 66 2e 0f 1f Feb 19 23:16:23 stefan-testhead kernel: RSP: 0018:ffffa2040068f630 EFLAGS: 00010202 Feb 19 23:16:23 stefan-testhead kernel: RAX: 00000000000002c0 RBX: fffffffffffffff4 RCX: 0000000000000000 Feb 19 23:16:23 stefan-testhead kernel: RDX: ffff95419f79e800 RSI: 0000000000000000 RDI: 0000000000000000 Feb 19 23:16:23 stefan-testhead kernel: RBP: ffffa2040068f6a8 R08: 00000000000005a8 R09: 0000000000000001 Feb 19 23:16:23 stefan-testhead kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffa2040068f740 Feb 19 23:16:23 stefan-testhead kernel: R13: 0000000000002722 R14: 0000000000000008 R15: ffff954191ffee00 Feb 19 23:16:23 stefan-testhead kernel: FS: 000071e80e6abe80(0000) GS:ffff9548dfb00000(0000) knlGS:0000000000000000 Feb 19 23:16:23 stefan-testhead kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 19 23:16:23 stefan-testhead kernel: CR2: 000055c5f24f8ff0 CR3: 000000010ea2e001 CR4: 00000000003726f0 Feb 19 23:16:23 stefan-testhead kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Feb 19 23:16:23 stefan-testhead kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-firmware in Ubuntu. https://bugs.launchpad.net/bugs/2100280 Title: invalid opcode and "Microcode SW error detected" in iwlwifi during extended WiFi stress test may trigger oom Status in backport-iwlwifi-dkms package in Ubuntu: New Status in linux-firmware package in Ubuntu: New Status in linux-hwe-6.8 package in Ubuntu: New Bug description: Hello there, I am not an experienced bug reporter but I would like to share this problem that we keep experiencing when stress testing the Intel BE200 WiFi card. We conduct 24 hour stress tests with WiFi 7 enabled on the AP (speeds range from 200 to 1700 mbit/s depending on the environment) to assess the cards reliability. In these tests, we frequently experience a problem where rapid allocation of all free memory by the kernel (skbuff_small_head) triggers the oom killer and kills a random innocent userspace process. See this log line from the oom killer: 2025-02-06 14:53:13.500 skbuff_small_head 22388173KB 22388173KB Our system has 32 GB of RAM and most of it is free (according to our monitoring) until 20 seconds before the oom event. The bug occurred In the cases we analyzed, a few minutes before the oom event, iwlwifi logs what seems to be quite fatal errors. In some instances, it logs: 2025-02-06 14:44:49.639 iwlwifi 0000:05:00.0: Microcode SW error detected. Restarting 0x0. in other instances, we see asm_exc_invalid_op: 2025-02-06 14:53:20.538 ? report_bug+0x17e/0x1b0 2025-02-06 14:53:20.539 ? handle_bug+0x46/0x90 2025-02-06 14:53:20.540 ? exc_invalid_op+0x18/0x80 2025-02-06 14:53:20.540 ? asm_exc_invalid_op+0x1b/0x20 2025-02-06 14:53:20.540 ? iwl_mvm_tx_tso_segment+0x372/0x390 [iwlmvm] 2025-02-06 14:53:20.540 iwl_mvm_tx_tso.constprop.0+0x2ce/0x330 [iwlmvm] 2025-02-06 14:53:20.540 iwl_mvm_tx_skb_sta+0x11e/0x2d0 [iwlmvm] 2025-02-06 14:53:20.541 iwl_mvm_tx_skb+0x1c/0x60 [iwlmvm] ... in a logged call trace. Now, we cannot prove causation but the correlation is very strong. The oom does not always happen after these iwlwifi logs but when oom happens, it is shortly after such logs (a few minutes). I will attach journal logs of the events to this bug report. We also tested with the Intel AX210 card and could not reproduce the problem. We are currently testing the BE200 card with a backported iwlwifi and newer firmware from: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git @ 1a1470d90de2a25e5befadb2f1fa30758af682ca https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/backport-iwlwifi.git @ e35111fbbe0b932054f73c7e95b8a4db2697d265 and the problems seem to disappear. That being said, we are still actively testing. The backported driver loads the gl-c0-fm-c0-96.ucode firmware on our hardware. Would you consider including a newer driver/firmware in the HWE stack? ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-modules-6.8.0-52-generic 6.8.0-52.53~22.04.1 ProcVersionSignature: Ubuntu 6.8.0-52.53~22.04.1-generic 6.8.12 Uname: Linux 6.8.0-52-generic x86_64 ApportVersion: 2.20.11-0ubuntu82.6 Architecture: amd64 CasperMD5CheckResult: pass Date: Wed Feb 26 10:38:40 2025 Dependencies: InstallationDate: Installed on 2022-04-21 (1041 days ago) InstallationMedia: Ubuntu-Server 21.10 "Impish Indri" - Release amd64 (20211013) ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR=<set> LANG=de_DE.UTF-8 SHELL=/bin/bash SourcePackage: linux-hwe-6.8 UpgradeStatus: Upgraded to jammy on 2022-12-01 (818 days ago) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/backport-iwlwifi-dkms/+bug/2100280/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp