> -----Original Message----- > From: Jason Wang [mailto:[email protected]] > Sent: Friday, March 31, 2017 3:17 PM > To: Liu, Yi L <[email protected]>; 'Peter Xu' <[email protected]> > Cc: Lan, Tianyu <[email protected]>; Tian, Kevin <[email protected]>; > '[email protected]' <[email protected]>; '[email protected]' > <[email protected]>; '[email protected]' <[email protected]>; 'qemu- > [email protected]' <[email protected]>; '[email protected]' > <[email protected]>; 'David Gibson' <[email protected]> > Subject: Re: [Qemu-devel] [PATCH v7 14/17] memory: add > MemoryRegionIOMMUOps.replay() callback > > > > On 2017年03月31日 13:34, Liu, Yi L wrote: > >> -----Original Message----- > >> From: Jason Wang [mailto:[email protected]] > >> Sent: Thursday, March 30, 2017 7:58 PM > >> To: Liu, Yi L <[email protected]>; 'Peter Xu' <[email protected]> > >> Cc: '[email protected]' <[email protected]>; Lan, > >> Tianyu <[email protected]>; Tian, Kevin <[email protected]>; > '[email protected]' > >> <[email protected]>; '[email protected]' <[email protected]>; > >> '[email protected]' <[email protected]>; 'David Gibson' > >> <[email protected]>; '[email protected]' <qemu- > >> [email protected]> > >> Subject: Re: [Qemu-devel] [PATCH v7 14/17] memory: add > >> MemoryRegionIOMMUOps.replay() callback > >> > >> > >> > >> On 2017年03月30日 19:06, Liu, Yi L wrote: > >>>> -----Original Message----- > >>>> From: Liu, Yi L > >>>> Sent: Monday, March 27, 2017 5:22 PM > >>>> To: Peter Xu <[email protected]> > >>>> Cc: [email protected]; Lan, Tianyu <[email protected]>; > >>>> Tian, Kevin <[email protected]>; [email protected]; > >>>> [email protected]; [email protected]; [email protected]; > >>>> David Gibson <[email protected]>; [email protected] > >>>> Subject: RE: [Qemu-devel] [PATCH v7 14/17] memory: add > >>>> MemoryRegionIOMMUOps.replay() callback > >>>> > >>>>> -----Original Message----- > >>>>> From: Peter Xu [mailto:[email protected]] > >>>>> Sent: Monday, March 27, 2017 5:12 PM > >>>>> To: Liu, Yi L <[email protected]> > >>>>> Cc: [email protected]; Lan, Tianyu > >>>>> <[email protected]>; Tian, Kevin <[email protected]>; > >>>>> [email protected]; [email protected]; [email protected]; > >>>>> [email protected]; David Gibson <[email protected]>; > >>>>> [email protected] > >>>>> Subject: Re: [Qemu-devel] [PATCH v7 14/17] memory: add > >>>>> MemoryRegionIOMMUOps.replay() callback > >>>>> > >>>>> On Mon, Mar 27, 2017 at 08:35:05AM +0000, Liu, Yi L wrote: > >>>>>>> -----Original Message----- > >>>>>>> From: Qemu-devel > >>>>>>> [mailto:[email protected]] On > >>>>>>> Behalf Of Peter Xu > >>>>>>> Sent: Tuesday, February 7, 2017 4:28 PM > >>>>>>> To: [email protected] > >>>>>>> Cc: Lan, Tianyu <[email protected]>; Tian, Kevin > >>>>>>> <[email protected]>; [email protected]; [email protected]; > >>>>>>> [email protected]; [email protected]; > >>>>>>> [email protected]; [email protected]; David Gibson > >>>>>>> <[email protected]> > >>>>>>> Subject: [Qemu-devel] [PATCH v7 14/17] memory: add > >>>>>>> MemoryRegionIOMMUOps.replay() callback > >>>>>>> > >>>>>>> Originally we have one memory_region_iommu_replay() function, > >>>>>>> which is the default behavior to replay the translations of the > >>>>>>> whole IOMMU region. However, on some platform like x86, we may > >>>>>>> want our own > >>>>> replay logic for IOMMU regions. > >>>>>>> This patch add one more hook for IOMMUOps for the callback, and > >>>>>>> it'll override the default if set. > >>>>>>> > >>>>>>> Signed-off-by: Peter Xu <[email protected]> > >>>>>>> --- > >>>>>>> include/exec/memory.h | 2 ++ > >>>>>>> memory.c | 6 ++++++ > >>>>>>> 2 files changed, 8 insertions(+) > >>>>>>> > >>>>>>> diff --git a/include/exec/memory.h b/include/exec/memory.h index > >>>>>>> 0767888..30b2a74 100644 > >>>>>>> --- a/include/exec/memory.h > >>>>>>> +++ b/include/exec/memory.h > >>>>>>> @@ -191,6 +191,8 @@ struct MemoryRegionIOMMUOps { > >>>>>>> void (*notify_flag_changed)(MemoryRegion *iommu, > >>>>>>> IOMMUNotifierFlag old_flags, > >>>>>>> IOMMUNotifierFlag new_flags); > >>>>>>> + /* Set this up to provide customized IOMMU replay function */ > >>>>>>> + void (*replay)(MemoryRegion *iommu, IOMMUNotifier > >>>>>>> + *notifier); > >>>>>>> }; > >>>>>>> > >>>>>>> typedef struct CoalescedMemoryRange CoalescedMemoryRange; > >>>>>>> diff --git a/memory.c b/memory.c index 7a4f2f9..9c253cc 100644 > >>>>>>> --- a/memory.c > >>>>>>> +++ b/memory.c > >>>>>>> @@ -1630,6 +1630,12 @@ void > >>>>>>> memory_region_iommu_replay(MemoryRegion > >>>>>>> *mr, IOMMUNotifier *n, > >>>>>>> hwaddr addr, granularity; > >>>>>>> IOMMUTLBEntry iotlb; > >>>>>>> + /* If the IOMMU has its own replay callback, override */ > >>>>>>> + if (mr->iommu_ops->replay) { > >>>>>>> + mr->iommu_ops->replay(mr, n); > >>>>>>> + return; > >>>>>>> + } > >>>>>> Hi Alex, Peter, > >>>>>> > >>>>>> Will all the other vendors(e.g. PPC, s390, ARM) add their own > >>>>>> replay callback as well? I guess it depends on whether the > >>>>>> original replay algorithm work well for them? Do you have such > >>>>>> knowledge? > >>>>> I guess so. At least for VT-d we had this callback since the > >>>>> default replay mechanism did not work well on x86 due to its > >>>>> extremely large memory region size. Thanks, > >>>> thx. that would make sense. > >>> Peter, > >>> > >>> Just come to mind that there may be a corner case here. > >>> > >>> Intel VT-d actually has a "pt" mode which allows device use physical > >>> address even when VT-d is enabled. In kernel, there is a > iommu_identity_mapping. > >>> If a device is in this map, then it would use "pt" mode. So that > >>> IOMMU driver would not build second-level page table for it. > >> Yes, but qemu does not support ECAP_PT now, so guest will still have > >> a page table in this case. > > That's true. Without ECAP_PT, IOMMU driver would create a 1:1 map. So > > this solution can work well even a device is in identify_map. > > > >>> Back to the virtual IOVA implementation, if an assigned device is in > >>> the iommu_identity_mapping(e.g. VGA controller), it uses GPA directly to > >>> do > DMA. > >>> So it demands a GPA->HPA mapping in host. However, the > >>> iommu->ops.replay is not able to build it when guest SL page table is > >>> empty. > >>> > >>> So I think building an entire guest PA->HPA mapping before guest > >>> kernel boot would be recommended. Any thoughts? > >> We plan to add PT in 2.10, a possible rough idea is disabled iommu > >> dmar region and use another region without iommu_ops. Then > >> vfio_listener_region_add() will just do the correct mappings. > > Good to know it. Actually, I also need to expose ECAP_PT for vSVM. So > > just comes to realize that the current replay solution may not work well > > when I > expose ECAP_PT to guest. > > I also have a rough idea here. The current listener in container > > listens to address space named with devfn if virtual VTd is added. How > > about adding one more listener to listen memory address space. So that the > listener can build entire guest PA->HPA mapping. > > This is only needed for PT. So looks like current code is sufficient to do > this I think. > See the else part of if (memory_region_is_iommu()) of > vfio_listener_region_add().
Jason, when the listener listen to device address space, the "else part" may not work even we set the mr->iommu_ops = NULL. The mr would be a non-ram region when the time region_add is called since it is actually listen to changes from device address space. Regards, Yi L
