Hi,

2021. 08. 05. 22:46 keltezéssel, Hans van Kranenburg írta:
severity 988477 normal
tags 988477 + moreinfo + upstream - bullseye-ignore
thanks

Hi!

On 6/13/21 3:58 PM, Imre Szőllősi wrote:
i tested on 4th hw

4. asus m4n78 pro, phenom ii x4 905e, md raid1, 2x samsung 1TB 860evo, 
lvm: problem does not appear

as i see, not all mb/chipset/sata pcie device affected
Thanks for your report, and for trying out different combinations of
hardware.

While doing a short internet search about the problems you're seeing
while using AMD ryzen, sata, nvme and iommu, I suspect this problem does
not have a lot to do with Xen specifically, but more with the hardware
and its firmware.

This also means that it's not a Debian packaging problem, and it cannot
be fixed by me (or the Debian Xen team). If you want to research this
problem more, I can maybe be of some help by providing suggestions.
Still, you will have to do all of the actual work, since I do not have
your hardware here.


okay let's do it



The first thing I would suggest is to try reproduce the problem when
booting with just Linux without Xen, and then trying the dbench test.


so, i don't write some scenarios, when the problem does not appear, here are them:

- without xen. no xen dmesg either, but simple dmesg do not show anything, dbench runs fine, the filesystem will not be read-only state.

- debian 10. this probably means something changed in xen or kernel or both between buster and bullseye, which causes.

- using another pcie sata controller. the another pcie device has only 1 function, while the onboard device has 3 functions:

01:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Device 43ee
01:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] Device 43eb
01:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43e9



If you don't actually need to directly pass-through hardware to a Xen
guest, you can also try disabling iommu,


when i disable iommu in bios, the xen dmesg messages changes to these:



(XEN) CPU4: No irq handler for vector 7c (IRQ -2147483648, LAPIC)
(XEN) CPU10: No irq handler for vector ee (IRQ -2147483648, LAPIC)
(XEN) CPU4: No irq handler for vector bd (IRQ -2147483648, LAPIC)
(XEN) CPU9: No irq handler for vector 2c (IRQ -2147483648, LAPIC)
(XEN) CPU4: No irq handler for vector cc (IRQ -90, LAPIC)
(XEN) IRQ89 a=0010[0010,0000] v=ec[ffffffff] t=PCI-MSI s=00000010
(XEN) CPU0: No irq handler for vector af (IRQ -2147483648, LAPIC)
(XEN) CPU2: No irq handler for vector 62 (IRQ -90, LAPIC)
(XEN) IRQ89 a=0001[0001,0000] v=72[ffffffff] t=PCI-MSI s=00000030
(XEN) CPU4: No irq handler for vector b2 (IRQ -2147483648, LAPIC)
(XEN) CPU0: No irq handler for vector 84 (IRQ -2147483648, LAPIC)
(XEN) CPU4: No irq handler for vector 4e (IRQ -2147483648, LAPIC)
(XEN) CPU8: No irq handler for vector de (IRQ -2147483648, LAPIC)
(XEN) CPU4: No irq handler for vector 68 (IRQ -2147483648, LAPIC)
(XEN) CPU8: No irq handler for vector b6 (IRQ -90, LAPIC)
(XEN) IRQ89 a=0040[0040,0000] v=c6[ffffffff] t=PCI-MSI s=00000030
(XEN) CPU10: No irq handler for vector 72 (IRQ -2147483648, LAPIC)
(XEN) CPU2: No irq handler for vector ec (IRQ -69, LAPIC)
(XEN) IRQ68 a=0004[0004,0400] v=3d[ec] t=PCI-MSI s=00000010
(XEN) CPU10: No irq handler for vector e5 (IRQ -90, LAPIC)
(XEN) IRQ89 a=0100[0100,0000] v=ed[ffffffff] t=PCI-MSI s=00000030
(XEN) CPU10: No irq handler for vector d1 (IRQ -2147483648, LAPIC)
(XEN) CPU10: No irq handler for vector 5a (IRQ -2147483648, LAPIC)
(XEN) CPU0: No irq handler for vector 7b (IRQ -69, LAPIC)
(XEN) IRQ68 a=0001[0001,0400] v=a3[7b] t=PCI-MSI s=00000010
(XEN) CPU8: No irq handler for vector bb (IRQ -2147483648, LAPIC)
(XEN) CPU8: No irq handler for vector 6c (IRQ -90, LAPIC)
(XEN) IRQ89 a=0040[0040,0000] v=74[ffffffff] t=PCI-MSI s=00000030
(XEN) CPU4: No irq handler for vector 86 (IRQ -2147483648, LAPIC)
(XEN) CPU8: No irq handler for vector 29 (IRQ -70, LAPIC)
(XEN) IRQ69 a=0040[0040,0001] v=31[a8] t=PCI-MSI/-X s=00000030
(XEN) CPU8: No irq handler for vector 8b (IRQ -2147483648, LAPIC)
(XEN) CPU4: No irq handler for vector 2c (IRQ -69, LAPIC)
(XEN) IRQ68 a=0010[0010,0400] v=44[2c] t=PCI-MSI s=00000010
(XEN) CPU2: No irq handler for vector ef (IRQ -2147483648, LAPIC)
(XEN) CPU0: No irq handler for vector c8 (IRQ -90, LAPIC)
(XEN) IRQ89 a=0001[0001,0000] v=e8[ffffffff] t=PCI-MSI s=00000010
(XEN) CPU2: No irq handler for vector 41 (IRQ -2147483648, LAPIC)
(XEN) CPU8: No irq handler for vector d6 (IRQ -70, LAPIC)
(XEN) IRQ69 a=0010[0010,0000] v=de[ffffffff] t=PCI-MSI/-X s=00000030
(XEN) CPU1: No irq handler for vector b1 (IRQ -2147483648, LAPIC)
(XEN) CPU4: No irq handler for vector d9 (IRQ -90, LAPIC)
(XEN) IRQ89 a=0010[0010,0000] v=2a[ffffffff] t=PCI-MSI s=00000010

the interrupts, if counts:

# cat /proc/interrupts | egrep " (68|69|89):"
  68:          0          0       4269          0          0          0          0          0          0          0          0          0  xen-percpu     -virq      timer2
  69:          0          0     809225          0          0          0          0          0          0          0          0          0  xen-percpu     -ipi       resched2
  89:          0          0          0          0          0         55          0          0          0          0          0          0  xen-percpu     -ipi       callfuncsingle5



 or researching other iommu=
options that can serve as a workaround.


while the iommu option in bios is auto, i try the following iommu kernel command line options without result:


    
                off
                force
                noforce
                biomerge
                merge
                nomerge
                soft
                pt
                nopt


iommu.passthrough= 0 or 1 doesn't matter either




In any case, further reports will need to have more detailed
information. For example, instead of "there are a lot of messages",
provide a text attachment with a piece of logging that shows these messages.


all message line contains exactly the same information:

(XEN) AMD-Vi: IO_PAGE_FAULT: 0000:01:00.1 d0 addr fffffffdf8000000 flags 0x8 I



I'm tagging this bug 'moreinfo' now, since it will depend on your
availability and abilities to work on it to have it advance.

Have fun,
Hans van Kranenburg


Thank you for dealing with it!



Reply via email to