On Tue, Aug 26, 2025 at 10:28:56AM +0200, Roger Pau Monné wrote: > On Tue, Aug 26, 2025 at 08:16:56AM +0200, Jan Beulich wrote: > > On 26.08.2025 03:49, Marek Marczykowski-Górecki wrote: > > > Hi, > > > > > > I'm hitting an MSI-X issue after rebooting the domU. The symptoms are > > > rather boring: on initial domU start the device (realtek eth card) works > > > fine, but after domU restart, the link doesn't come up (there is no > > > "Link is Up" message anymore). No errors from domU driver or Xen. I > > > tracked it down to MSI-X - if I force INTx (via pci=nomsi on domU > > > cmdline) it works fine. Convincing the driver to poll instead of waiting > > > for an interrupt also workarounds the issue. > > > > > > I noticed also some interrupts are not cleaned up on restart. The list > > > of MSIs in 'Q' debug key output grows: > > > > > > (XEN) 0000:03:00.0 - d22 - node -1 - MSIs < 41 42 43 44 45 46 47 > > > > restart sys-net domU > > > (XEN) 0000:03:00.0 - d24 - node -1 - MSIs < 41 42 43 44 45 46 47 48 > > > > restart sys-net domU > > > (XEN) 0000:03:00.0 - d26 - node -1 - MSIs < 41 42 43 44 45 46 47 48 > > > 49 > > > > > > > and 'M' output is: > > > > > > (XEN) MSI-X 41 vec=b1 lowest edge assert log lowest > > > dest=00000001 mask=1/H /1 > > > (XEN) MSI-X 42 vec=b9 lowest edge assert log lowest > > > dest=00000004 mask=1/HG/1 > > > (XEN) MSI-X 43 vec=c1 lowest edge assert log lowest > > > dest=00000010 mask=1/HG/1 > > > (XEN) MSI-X 44 vec=d9 lowest edge assert log lowest > > > dest=00000001 mask=1/HG/1 > > > (XEN) MSI-X 45 vec=e1 lowest edge assert log lowest > > > dest=00000001 mask=1/HG/1 > > > (XEN) MSI-X 46 vec=e9 lowest edge assert log lowest > > > dest=00000040 mask=1/HG/1 > > > (XEN) MSI-X 47 vec=32 lowest edge assert log lowest > > > dest=00000004 mask=1/HG/1 > > > (XEN) MSI-X 48 vec=3a lowest edge assert log lowest > > > dest=00000040 mask=1/HG/1 > > > (XEN) MSI-X 49 vec=42 lowest edge assert log lowest > > > dest=00000010 mask=1/ G/1 > > > > > > And also, after starting and stopping the domU, `xl pci-assignable-remove > > > 03:00.0` > > > makes pciback to complain: > > > > > > [ 1180.919874] pciback 0000:03:00.0: xen_pciback: MSI-X release > > > failed (-16) > > > > > > This is all running on Xen 4.19.3, but I don't see much changes in this > > > area since then. > > > > > > Some more info collected at > > > https://github.com/QubesOS/qubes-issues/issues/9335 > > > > > > My question is: what should be responsible for this cleanup on domain > > > destroy? Xen, or maybe device model (which is QEMU in stubdomain here)? > > > > The expectation is that qemu invokes the necessary cleanup, but of course > > ... > > > > > I see some cleanup (apparently not enough) happening via QEMU when the > > > domU driver is unloaded, but logically correct cleanup shouldn't depend > > > on correct domU operation... > > > > ... Xen may not make itself dependent upon either DomU or QEMU. > > AFAICT free_domain_pirqs() called by arch_domain_destroy() should take > care of unbinding and freeing pirqs (but obviously not in this case). > Can you repeat the test with a debug=y hypervisor and post the > resulting serial or dmesg here? Some of the errors on those paths are > printed with dprintk() and won't be visible unless using a Xen debug > build.
Sure, will do.
> > What I find puzzling (assuming I can take the quoted output plus your
> > annotations
> > verbatim) is that the device apparently uses multiple vectors,
No, that was not the first domU restart before I started collecting this
output. At fresh boot there is just one vector.
> > and we're leaking
> > exactly one of them. Also, since reboot is generally nothing else than
> > shutdown
> > and immediate relaunch, is there a leak also after shutdown? I ask because
> > it
> > might help to know which of the multiple vectors is leaked (first, last,
> > random).
>
> Can we maybe get the output of `lspci -vv` when the device is
> attached?
Both below on first domU start, when the device still works, but when it
breaks it's identical.
Collected in dom0:
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller (rev 06)
Subsystem: Gigabyte Technology Co., Ltd Onboard Ethernet
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 18
Region 0: I/O ports at e000 [size=256]
Region 2: Memory at f7c00000 (64-bit, non-prefetchable) [size=4K]
Region 4: Memory at f0000000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [70] Express (v2) Endpoint, IntMsgNum 1
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns,
L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
SlotPowerLimit 10W TEE-IO-
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+
TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
Latency L0s unlimited, L1 <64us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
FltModeDis-
LnkSta: Speed 2.5GT/s, Width x1
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP-
LTR-
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt-
EETLPPrefix-
EmergencyPowerReduction Not Supported,
EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
AtomicOpsCtl: ReqEn-
IDOReq- IDOCompl- LTR- EmergencyPowerReductionReq-
10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB
preshoot
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-
EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3-
LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported, FltMode-
Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
Vector table: BAR=4 offset=00000000
PBA: BAR=4 offset=00000800
Capabilities: [d0] Vital Product Data
Not readable
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP-
ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP-
AtomicOpBlocked- TLPBlockedErr-
PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP-
PCRC_CHECK- TLPXlatBlocked-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP-
ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP-
AtomicOpBlocked- TLPBlockedErr-
PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP-
PCRC_CHECK- TLPXlatBlocked-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
RxOF+ MalfTLP+
ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP-
AtomicOpBlocked- TLPBlockedErr-
PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP-
PCRC_CHECK- TLPXlatBlocked-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout-
AdvNonFatalErr- CorrIntErr- HeaderOF-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout-
AdvNonFatalErr+ CorrIntErr- HeaderOF-
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [140 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
Status: NegoPending- InProgress-
Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
Kernel driver in use: pciback
Kernel modules: r8169
and the domU view:
00:06.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller (rev 06)
Subsystem: Gigabyte Technology Co., Ltd Onboard Ethernet
Physical Slot: 6
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 40
Region 0: I/O ports at c200 [size=256]
Region 2: Memory at f2018000 (64-bit, non-prefetchable) [size=4K]
Region 4: Memory at f2010000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [70] Express (v2) Endpoint, IntMsgNum 1
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns,
L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
SlotPowerLimit 10W TEE-IO-
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+
TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
Latency L0s unlimited, L1 <64us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
FltModeDis-
LnkSta: Speed 2.5GT/s, Width x1
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP-
LTR-
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt-
EETLPPrefix-
EmergencyPowerReduction Not Supported,
EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
AtomicOpsCtl: ReqEn-
IDOReq- IDOCompl- LTR- EmergencyPowerReductionReq-
10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-
EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3-
LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported, FltMode-
Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
Vector table: BAR=4 offset=00000000
PBA: BAR=4 offset=00000800
Capabilities: [d0] Vital Product Data
Not readable
Kernel driver in use: r8169
Kernel modules: r8169
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
signature.asc
Description: PGP signature
