Hi Alex,

On 2021/11/10 7:56, Alex Williamson wrote:

Hi Baolu,

Have you looked into this?

I am looking at this.

I'm able to reproduce by starting and
destroying an assigned device VM several times.  It seems like it came
in with Joerg's pull request for the v5.15 merge window.  Bisecting
lands me on 3f34f1259776 where intel-iommu added map/unmap_pages
support, but I'm not convinced that isn't an artifact that the regular
map/unmap calls had been been simplified to only be used for single
pages by that point.  If I mask the map/unmap_pages callbacks and use
map/unmap with (pgsize * size) and restore the previous pgsize_bitmap,
I can generate the same faults.  So maybe the root issue was introduced
somewhere else, or perhaps it is a latent bug in clearing of pte ranges
as Ajay proposes below.  In any case, I think there's a real issue
here.  Thanks,

I am trying to reproduce this issue with my local setup. I will come
back again after I have more details.

Best regards,
baolu


Alex

On Tue, 12 Oct 2021 19:26:53 +0530
Ajay Garg <[email protected]> wrote:

=== Issue ===

Kernel-flooding is seen, when an x86_64 L1 guest (Ubuntu-21) is booted in 
qemu/kvm
on a x86_64 host (Ubuntu-21), with a host-pci-device attached.

Following kind of logs, along with the stacktraces, cause the flood :

......
  DMAR: ERROR: DMA PTE for vPFN 0x428ec already set (to 3f6ec003 not 3f6ec003)
  DMAR: ERROR: DMA PTE for vPFN 0x428ed already set (to 3f6ed003 not 3f6ed003)
  DMAR: ERROR: DMA PTE for vPFN 0x428ee already set (to 3f6ee003 not 3f6ee003)
  DMAR: ERROR: DMA PTE for vPFN 0x428ef already set (to 3f6ef003 not 3f6ef003)
  DMAR: ERROR: DMA PTE for vPFN 0x428f0 already set (to 3f6f0003 not 3f6f0003)
......



=== Current Behaviour, leading to the issue ===

Currently, when we do a dma-unmapping, we unmap/unlink the mappings, but
the pte-entries are not cleared.

Thus, following sequencing would flood the kernel-logs :

i)
A dma-unmapping makes the real/leaf-level pte-slot invalid, but the
pte-content itself is not cleared.

ii)
Now, during some later dma-mapping procedure, as the pte-slot is about
to hold a new pte-value, the intel-iommu checks if a prior
pte-entry exists in the pte-slot. If it exists, it logs a kernel-error,
along with a corresponding stacktrace.

iii)
Step ii) runs in abundance, and the kernel-logs run insane.



=== Fix ===

We ensure that as part of a dma-unmapping, each (unmapped) pte-slot
is also cleared of its value/content (at the leaf-level, where the
real mapping from a iova => pfn mapping is stored).

This completes a "deep" dma-unmapping.



Signed-off-by: Ajay Garg <[email protected]>
---
  drivers/iommu/intel/iommu.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d75f59ae28e6..485a8ea71394 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5090,6 +5090,8 @@ static size_t intel_iommu_unmap(struct iommu_domain 
*domain,
        gather->freelist = domain_unmap(dmar_domain, start_pfn,
                                        last_pfn, gather->freelist);
+ dma_pte_clear_range(dmar_domain, start_pfn, last_pfn);
+
        if (dmar_domain->max_addr == iova + size)
                dmar_domain->max_addr = iova;

_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to