Bug#1107521: ath12k_pci errors and loss of connectivity in 6.12.y branch

Robin Murphy Fri, 27 Jun 2025 03:37:15 -0700

+Vasant

On 2025-06-27 6:39 am, Baochen Qiang wrote:

[+ IOMMU list]


On 6/27/2025 12:21 AM, Matt Mower wrote:

Dear maintainer,

I have been experiencing lost network connection with the ath12k_pci driver
in the linux-6.12.y kernel branch. Often, when the issue occurs, the
network does not recover until I reboot the computer. A full report of the
errors I encounter, the symptoms that arise, and several dmesg attachments
are in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1107521 . I have
attached a dmesg from 6.12.34 for convenience. The short summary is:

1. I started noticing log lines like the following soon after boot when I
updated from 6.12.22 to 6.12.27. After these events occur, the network goes
down and often does not come back up.
    ath12k_pci 0000:c2:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
domain=0x0010 address=0xfea00000 flags=0x0020]
2. I was able to reproduce this issue very rarely in 6.12.12 and 6.12.22.
The issue always occurs soon after boot in 6.12.27, 6.12.30, 6.12.33, and
6.12.34.
3. I have not reproduced the issue in 6.15.2 or 6.15.3.
4. In some cases, when shutting down the computer, a kernel bug caused my
computer to hang. I haven't determined whether this is related to the issue
above or an independent issue. Search the bug report
for PXL_20250611_140820085.jpg to see a picture of the kernel bug on my
laptop screen.
5. I have tested two firmware versions:
    a. fw_version 0x1108811c fw_build_timestamp 2025-05-17 00:21 fw_build_id
QC_IMAGE_VERSION_STRING=WLAN.HMT.1.1.c5-00284.1-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3
    b. fw_version 0x100301e1 fw_build_timestamp 2023-12-06 04:05 fw_build_id
QC_IMAGE_VERSION_STRING=WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3

Thanks,
Matt


I had a quick test with 6.12.27 kernel on both my Intel desktop and AMD RD but 
didn't hit
the issue. And I am using 
WLAN.HMT.1.1.c5-00284.1-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3.

As mentioned in the Debian bug report, since reverting ath12k patches does not 
fix this
issue, maybe it comes from the IOMMU subsystem?

Faults are usually still indicative of the client driver/subsystem doingsomething not quite right - racily performing dma_unmap before thedevice has actually finished making accesses; mapping the wrong sizesuch that the device accesses off the end of the mapping (this can oftenrun into another valid mapping so not necessarily fault); mapping thewrong DMA direction such that the device then tries to write to aread-only page. However I suppose it's not impossible that some fix toamd-iommu in that period might have changed its behaviour in a way thatexacerbates things - Vasant, does this strike a chord with anythingyou're aware of?

A couple more things I'd try on the ath12k side: firstly, boot with"iommu.strict=1" and see if that makes the faults any morefrequent/reproducible; if a fault is fairly easily reproducible, thenuse the DMA API and/or IOMMU API tracepoints to compare the faultaddress to prior DMA mapping activity - that can usually reveal thenature of the bug enough to then know what to go looking for.

I wouldn't put much significance in whatever happens *after* the fault -presumably the driver is assuming the blocked DMA write has completed,so then goes on to read some incomplete descriptor as if it were valid,and thus may fall over in all manner of entertaining ways on bogus data.


Thanks,
Robin.

Bug#1107521: ath12k_pci errors and loss of connectivity in 6.12.y branch

Reply via email to