Noble is on HWE 6.14 now. Are your still seeing this issue with that
release? If so, please attached the kernel log.
** Changed in: linux-firmware (Ubuntu)
Status: New => Incomplete
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-firmware in Ubuntu.
https://bugs.launchpad.net/bugs/2100280
Title:
invalid opcode and "Microcode SW error detected" in iwlwifi during
extended WiFi stress test may trigger oom
Status in backport-iwlwifi-dkms package in Ubuntu:
Invalid
Status in linux-firmware package in Ubuntu:
Incomplete
Status in linux-hwe-6.8 package in Ubuntu:
New
Bug description:
Hello there,
I am not an experienced bug reporter but I would like to share this problem
that we keep experiencing
when stress testing the Intel BE200 WiFi card. We conduct 24 hour stress
tests with WiFi 7 enabled on
the AP (speeds range from 200 to 1700 mbit/s depending on the environment) to
assess the cards reliability.
In these tests, we frequently experience a problem where rapid allocation of
all free memory by the
kernel (skbuff_small_head) triggers the oom killer and kills a random
innocent userspace process. See
this log line from the oom killer:
2025-02-06 14:53:13.500 skbuff_small_head 22388173KB 22388173KB
Our system has 32 GB of RAM and most of it is free (according to our
monitoring) until
20 seconds before the oom event. The bug occurred
In the cases we analyzed, a few minutes before the oom event, iwlwifi logs
what seems to be quite
fatal errors. In some instances, it logs:
2025-02-06 14:44:49.639 iwlwifi 0000:05:00.0: Microcode SW error
detected. Restarting 0x0.
in other instances, we see asm_exc_invalid_op:
2025-02-06 14:53:20.538 ? report_bug+0x17e/0x1b0
2025-02-06 14:53:20.539 ? handle_bug+0x46/0x90
2025-02-06 14:53:20.540 ? exc_invalid_op+0x18/0x80
2025-02-06 14:53:20.540 ? asm_exc_invalid_op+0x1b/0x20
2025-02-06 14:53:20.540 ? iwl_mvm_tx_tso_segment+0x372/0x390 [iwlmvm]
2025-02-06 14:53:20.540 iwl_mvm_tx_tso.constprop.0+0x2ce/0x330 [iwlmvm]
2025-02-06 14:53:20.540 iwl_mvm_tx_skb_sta+0x11e/0x2d0 [iwlmvm]
2025-02-06 14:53:20.541 iwl_mvm_tx_skb+0x1c/0x60 [iwlmvm]
...
in a logged call trace. Now, we cannot prove causation but the correlation is
very strong.
The oom does not always happen after these iwlwifi logs but when oom happens,
it is shortly
after such logs (a few minutes).
I will attach journal logs of the events to this bug report.
We also tested with the Intel AX210 card and could not reproduce the problem.
We are currently testing the BE200 card with a backported iwlwifi and newer
firmware from:
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git @
1a1470d90de2a25e5befadb2f1fa30758af682ca
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/backport-iwlwifi.git
@ e35111fbbe0b932054f73c7e95b8a4db2697d265
and the problems seem to disappear. That being said, we are still actively
testing.
The backported driver loads the gl-c0-fm-c0-96.ucode firmware on our hardware.
Would you consider including a newer driver/firmware in the HWE stack?
ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: linux-modules-6.8.0-52-generic 6.8.0-52.53~22.04.1
ProcVersionSignature: Ubuntu 6.8.0-52.53~22.04.1-generic 6.8.12
Uname: Linux 6.8.0-52-generic x86_64
ApportVersion: 2.20.11-0ubuntu82.6
Architecture: amd64
CasperMD5CheckResult: pass
Date: Wed Feb 26 10:38:40 2025
Dependencies:
InstallationDate: Installed on 2022-04-21 (1041 days ago)
InstallationMedia: Ubuntu-Server 21.10 "Impish Indri" - Release amd64
(20211013)
ProcEnviron:
TERM=xterm-256color
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=de_DE.UTF-8
SHELL=/bin/bash
SourcePackage: linux-hwe-6.8
UpgradeStatus: Upgraded to jammy on 2022-12-01 (818 days ago)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/backport-iwlwifi-dkms/+bug/2100280/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp