I think both problems have the same root cause, whatever that is.

Feb 19 22:21:12 stefan-testhead kernel: wls1f0: Limiting TX power to 30 (30 - 
0) dBm as advertised by d6:24:dd:03:ec:30
Feb 19 23:16:23 stefan-testhead kernel: ------------[ cut here ]------------
Feb 19 23:16:23 stefan-testhead kernel: WARNING: CPU: 2 PID: 4971 at 
drivers/net/wireless/intel/iwlwifi/mvm/tx.c:929 
iwl_mvm_tx_tso_segment+0x372/0x390 [iwlmvm]
Feb 19 23:16:23 stefan-testhead kernel: Modules linked in: cmac ipvtap ipvlan 
ccm vhost_net vhost vhost_iotlb tap xt_CHECKSUM xt_MASQUERADE xt_conntrack 
ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat 
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables n
fnetlink bridge stp llc wireguard curve25519_x86_64 libchacha20poly1305 
chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel 
udp_tunnel binfmt_misc intel_rapl_msr intel_rapl_common intel_uncore_frequency 
intel_uncore_frequency_common sb_edac x86_pkg_t
emp_thermal intel_powerclamp coretemp iwlmvm mac80211 kvm_intel libarc4 kvm 
btusb btrtl irqbypass nls_iso8859_1 btintel btbcm iwlwifi btmtk rapl ipmi_ssif 
intel_cstate cmdlinepart bluetooth spi_nor ecdh_generic mei_me mtd 
intel_pch_thermal input_leds joydev ecc cfg80211 mei 
ioatdma acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler mac_hid acpi_pad 
sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr efi_pstore 
ip_tables x_tables autofs4 btrfs blake2b_generic dm_crypt raid10 raid456
Feb 19 23:16:23 stefan-testhead kernel:  async_raid6_recov async_memcpy 
async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 hid_generic 
usbhid hid cdc_ncm cdc_ether usbnet r8152 mii crct10dif_pclmul crc32_pclmul 
polyval_clmulni polyval_generic ghash_clmulni_intel 
sha256_ssse3 spi_intel_platform spi_intel gpio_ich mxm_wmi sha1_ssse3 nvme ahci 
i2c_i801 ixgbe igb libahci lpc_ich ast i2c_smbus xhci_pci nvme_core xfrm_algo 
i2c_algo_bit xhci_pci_renesas dca mdio nvme_auth wmi aesni_intel crypto_simd 
cryptd
Feb 19 23:16:23 stefan-testhead kernel: CPU: 2 PID: 4971 Comm: vhost-4734 Not 
tainted 6.8.0-51-generic #52~22.04.1-Ubuntu
Feb 19 23:16:23 stefan-testhead kernel: Hardware name: Supermicro 
SYS-E300-8D/X10SDV-TP8F, BIOS 2.3 05/07/2021
Feb 19 23:16:23 stefan-testhead kernel: RIP: 
0010:iwl_mvm_tx_tso_segment+0x372/0x390 [iwlmvm]
Feb 19 23:16:23 stefan-testhead kernel: Code: ec 49 8b 97 c8 00 00 00 44 8b 45 
98 41 b9 01 00 00 00 48 89 c3 41 8b 87 c0 00 00 00 8b 4d 94 66 44 89 44 02 04 
e9 66 fd ff ff <0f> 0b b8 ea ff ff ff e9 86 fe ff ff e8 cd d9 78 ec 66 66 2e 0f 
1f
Feb 19 23:16:23 stefan-testhead kernel: RSP: 0018:ffffa2040068f630 EFLAGS: 
00010202
Feb 19 23:16:23 stefan-testhead kernel: RAX: 00000000000002c0 RBX: 
fffffffffffffff4 RCX: 0000000000000000
Feb 19 23:16:23 stefan-testhead kernel: RDX: ffff95419f79e800 RSI: 
0000000000000000 RDI: 0000000000000000
Feb 19 23:16:23 stefan-testhead kernel: RBP: ffffa2040068f6a8 R08: 
00000000000005a8 R09: 0000000000000001
Feb 19 23:16:23 stefan-testhead kernel: R10: 0000000000000000 R11: 
0000000000000000 R12: ffffa2040068f740
Feb 19 23:16:23 stefan-testhead kernel: R13: 0000000000002722 R14: 
0000000000000008 R15: ffff954191ffee00
Feb 19 23:16:23 stefan-testhead kernel: FS:  000071e80e6abe80(0000) 
GS:ffff9548dfb00000(0000) knlGS:0000000000000000
Feb 19 23:16:23 stefan-testhead kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
Feb 19 23:16:23 stefan-testhead kernel: CR2: 000055c5f24f8ff0 CR3: 
000000010ea2e001 CR4: 00000000003726f0
Feb 19 23:16:23 stefan-testhead kernel: DR0: 0000000000000000 DR1: 
0000000000000000 DR2: 0000000000000000
Feb 19 23:16:23 stefan-testhead kernel: DR3: 0000000000000000 DR6: 
00000000fffe0ff0 DR7: 0000000000000400

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-firmware in Ubuntu.
https://bugs.launchpad.net/bugs/2100280

Title:
  invalid opcode and "Microcode SW error detected" in iwlwifi during
  extended WiFi stress test may trigger oom

Status in backport-iwlwifi-dkms package in Ubuntu:
  New
Status in linux-firmware package in Ubuntu:
  New
Status in linux-hwe-6.8 package in Ubuntu:
  New

Bug description:
  Hello there,

  I am not an experienced bug reporter but I would like to share this problem 
that we keep experiencing
  when stress testing the Intel BE200 WiFi card. We conduct 24 hour stress 
tests with WiFi 7 enabled on
  the AP (speeds range from 200 to 1700 mbit/s depending on the environment) to 
assess the cards reliability.

  In these tests, we frequently experience a problem where rapid allocation of 
all free memory by the
  kernel (skbuff_small_head) triggers the oom killer and kills a random 
innocent userspace process. See
  this log line from the oom killer:

  2025-02-06 14:53:13.500 skbuff_small_head   22388173KB   22388173KB

  Our system has 32 GB of RAM and most of it is free (according to our 
monitoring) until
  20 seconds before the oom event. The bug occurred 

  In the cases we analyzed, a few minutes before the oom event, iwlwifi logs 
what seems to be quite
  fatal errors. In some instances, it logs:

  2025-02-06 14:44:49.639 iwlwifi 0000:05:00.0: Microcode SW error
  detected. Restarting 0x0.

  in other instances, we see asm_exc_invalid_op:

  2025-02-06 14:53:20.538        ? report_bug+0x17e/0x1b0
  2025-02-06 14:53:20.539        ? handle_bug+0x46/0x90
  2025-02-06 14:53:20.540        ? exc_invalid_op+0x18/0x80
  2025-02-06 14:53:20.540        ? asm_exc_invalid_op+0x1b/0x20
  2025-02-06 14:53:20.540        ? iwl_mvm_tx_tso_segment+0x372/0x390 [iwlmvm]
  2025-02-06 14:53:20.540        iwl_mvm_tx_tso.constprop.0+0x2ce/0x330 [iwlmvm]
  2025-02-06 14:53:20.540        iwl_mvm_tx_skb_sta+0x11e/0x2d0 [iwlmvm]
  2025-02-06 14:53:20.541        iwl_mvm_tx_skb+0x1c/0x60 [iwlmvm]
  ...

  in a logged call trace. Now, we cannot prove causation but the correlation is 
very strong.
  The oom does not always happen after these iwlwifi logs but when oom happens, 
it is shortly
  after such logs (a few minutes).
  I will attach journal logs of the events to this bug report.
  We also tested with the Intel AX210 card and could not reproduce the problem.

  We are currently testing the BE200 card with a backported iwlwifi and newer 
firmware from:
  https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git @ 
1a1470d90de2a25e5befadb2f1fa30758af682ca
  https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/backport-iwlwifi.git 
@ e35111fbbe0b932054f73c7e95b8a4db2697d265
  and the problems seem to disappear. That being said, we are still actively 
testing.
  The backported driver loads the gl-c0-fm-c0-96.ucode firmware on our hardware.

  Would you consider including a newer driver/firmware in the HWE stack?

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-modules-6.8.0-52-generic 6.8.0-52.53~22.04.1
  ProcVersionSignature: Ubuntu 6.8.0-52.53~22.04.1-generic 6.8.12
  Uname: Linux 6.8.0-52-generic x86_64
  ApportVersion: 2.20.11-0ubuntu82.6
  Architecture: amd64
  CasperMD5CheckResult: pass
  Date: Wed Feb 26 10:38:40 2025
  Dependencies:
   
  InstallationDate: Installed on 2022-04-21 (1041 days ago)
  InstallationMedia: Ubuntu-Server 21.10 "Impish Indri" - Release amd64 
(20211013)
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=de_DE.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-hwe-6.8
  UpgradeStatus: Upgraded to jammy on 2022-12-01 (818 days ago)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/backport-iwlwifi-dkms/+bug/2100280/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to