On 2017年03月30日 19:06, Liu, Yi L wrote:
-----Original Message-----
From: Liu, Yi L
Sent: Monday, March 27, 2017 5:22 PM
To: Peter Xu <[email protected]>
Cc: [email protected]; Lan, Tianyu <[email protected]>; Tian, Kevin
<[email protected]>; [email protected]; [email protected];
[email protected]; [email protected]; David Gibson
<[email protected]>; [email protected]
Subject: RE: [Qemu-devel] [PATCH v7 14/17] memory: add
MemoryRegionIOMMUOps.replay() callback

-----Original Message-----
From: Peter Xu [mailto:[email protected]]
Sent: Monday, March 27, 2017 5:12 PM
To: Liu, Yi L <[email protected]>
Cc: [email protected]; Lan, Tianyu <[email protected]>;
Tian, Kevin <[email protected]>; [email protected];
[email protected]; [email protected]; [email protected]; David
Gibson <[email protected]>; [email protected]
Subject: Re: [Qemu-devel] [PATCH v7 14/17] memory: add
MemoryRegionIOMMUOps.replay() callback

On Mon, Mar 27, 2017 at 08:35:05AM +0000, Liu, Yi L wrote:
-----Original Message-----
From: Qemu-devel
[mailto:[email protected]] On
Behalf Of Peter Xu
Sent: Tuesday, February 7, 2017 4:28 PM
To: [email protected]
Cc: Lan, Tianyu <[email protected]>; Tian, Kevin
<[email protected]>; [email protected]; [email protected];
[email protected]; [email protected];
[email protected]; [email protected]; David Gibson
<[email protected]>
Subject: [Qemu-devel] [PATCH v7 14/17] memory: add
MemoryRegionIOMMUOps.replay() callback

Originally we have one memory_region_iommu_replay() function,
which is the default behavior to replay the translations of the
whole IOMMU region. However, on some platform like x86, we may
want our own
replay logic for IOMMU regions.
This patch add one more hook for IOMMUOps for the callback, and
it'll override the default if set.

Signed-off-by: Peter Xu <[email protected]>
---
  include/exec/memory.h | 2 ++
  memory.c              | 6 ++++++
  2 files changed, 8 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h index
0767888..30b2a74 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -191,6 +191,8 @@ struct MemoryRegionIOMMUOps {
      void (*notify_flag_changed)(MemoryRegion *iommu,
                                  IOMMUNotifierFlag old_flags,
                                  IOMMUNotifierFlag new_flags);
+    /* Set this up to provide customized IOMMU replay function */
+    void (*replay)(MemoryRegion *iommu, IOMMUNotifier *notifier);
  };

  typedef struct CoalescedMemoryRange CoalescedMemoryRange; diff
--git a/memory.c b/memory.c index 7a4f2f9..9c253cc 100644
--- a/memory.c
+++ b/memory.c
@@ -1630,6 +1630,12 @@ void
memory_region_iommu_replay(MemoryRegion
*mr, IOMMUNotifier *n,
      hwaddr addr, granularity;
      IOMMUTLBEntry iotlb;
+    /* If the IOMMU has its own replay callback, override */
+    if (mr->iommu_ops->replay) {
+        mr->iommu_ops->replay(mr, n);
+        return;
+    }
Hi Alex, Peter,

Will all the other vendors(e.g. PPC, s390, ARM) add their own replay
callback as well? I guess it depends on whether the original replay
algorithm work well for them? Do you have such knowledge?
I guess so. At least for VT-d we had this callback since the default
replay mechanism did not work well on x86 due to its extremely large
memory region size. Thanks,
thx. that would make sense.
Peter,

Just come to mind that there may be a corner case here.

Intel VT-d actually has a "pt" mode which allows device use physical address
even when VT-d is enabled. In kernel, there is a iommu_identity_mapping.
If a device is in this map, then it would use "pt" mode. So that IOMMU driver
would not build second-level page table for it.

Yes, but qemu does not support ECAP_PT now, so guest will still have a page table in this case.


Back to the virtual IOVA implementation, if an assigned device is in the
iommu_identity_mapping(e.g. VGA controller), it uses GPA directly to do DMA.
So it demands a GPA->HPA mapping in host. However, the iommu->ops.replay
is not able to build it when guest SL page table is empty.

So I think building an entire guest PA->HPA mapping before guest kernel boot
would be recommended. Any thoughts?

We plan to add PT in 2.10, a possible rough idea is disabled iommu dmar region and use another region without iommu_ops. Then vfio_listener_region_add() will just do the correct mappings.

Thanks


Regards,
Yi L


Reply via email to