> -----Original Message----- > From: Jason Wang [mailto:[email protected]] > Sent: Saturday, April 1, 2017 1:01 PM > To: Liu, Yi L <[email protected]>; 'Peter Xu' <[email protected]> > Cc: Lan, Tianyu <[email protected]>; Tian, Kevin <[email protected]>; > '[email protected]' <[email protected]>; '[email protected]' > <[email protected]>; '[email protected]' <[email protected]>; 'qemu- > [email protected]' <[email protected]>; '[email protected]' > <[email protected]>; 'David Gibson' <[email protected]> > Subject: Re: [Qemu-devel] [PATCH v7 14/17] memory: add > MemoryRegionIOMMUOps.replay() callback > > > > On 2017年03月31日 15:30, Liu, Yi L wrote: > >> -----Original Message----- > >> From: Jason Wang [mailto:[email protected]] > >> Sent: Friday, March 31, 2017 3:17 PM > >> To: Liu, Yi L <[email protected]>; 'Peter Xu' <[email protected]> > >> Cc: Lan, Tianyu <[email protected]>; Tian, Kevin > >> <[email protected]>; '[email protected]' <[email protected]>; > '[email protected]' > >> <[email protected]>; '[email protected]' <[email protected]>; > >> 'qemu- [email protected]' <[email protected]>; > '[email protected]' > >> <[email protected]>; 'David Gibson' > >> <[email protected]> > >> Subject: Re: [Qemu-devel] [PATCH v7 14/17] memory: add > >> MemoryRegionIOMMUOps.replay() callback > >> > >> > >> > >> On 2017年03月31日 13:34, Liu, Yi L wrote: > >>>> -----Original Message----- > >>>> From: Jason Wang [mailto:[email protected]] > >>>> Sent: Thursday, March 30, 2017 7:58 PM > >>>> To: Liu, Yi L <[email protected]>; 'Peter Xu' <[email protected]> > >>>> Cc: '[email protected]' <[email protected]>; Lan, > >>>> Tianyu <[email protected]>; Tian, Kevin <[email protected]>; > >> '[email protected]' > >>>> <[email protected]>; '[email protected]' > >>>> <[email protected]>; '[email protected]' <[email protected]>; > 'David Gibson' > >>>> <[email protected]>; '[email protected]' <qemu- > >>>> [email protected]> > >>>> Subject: Re: [Qemu-devel] [PATCH v7 14/17] memory: add > >>>> MemoryRegionIOMMUOps.replay() callback > >>>> > >>>> > >>>> > >>>> On 2017年03月30日 19:06, Liu, Yi L wrote: > >>>>>> -----Original Message----- > >>>>>> From: Liu, Yi L > >>>>>> Sent: Monday, March 27, 2017 5:22 PM > >>>>>> To: Peter Xu <[email protected]> > >>>>>> Cc: [email protected]; Lan, Tianyu > >>>>>> <[email protected]>; Tian, Kevin <[email protected]>; > >>>>>> [email protected]; [email protected]; [email protected]; > >>>>>> [email protected]; David Gibson <[email protected]>; > >>>>>> [email protected] > >>>>>> Subject: RE: [Qemu-devel] [PATCH v7 14/17] memory: add > >>>>>> MemoryRegionIOMMUOps.replay() callback > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Peter Xu [mailto:[email protected]] > >>>>>>> Sent: Monday, March 27, 2017 5:12 PM > >>>>>>> To: Liu, Yi L <[email protected]> > >>>>>>> Cc: [email protected]; Lan, Tianyu > >>>>>>> <[email protected]>; Tian, Kevin <[email protected]>; > >>>>>>> [email protected]; [email protected]; [email protected]; > >>>>>>> [email protected]; David Gibson <[email protected]>; > >>>>>>> [email protected] > >>>>>>> Subject: Re: [Qemu-devel] [PATCH v7 14/17] memory: add > >>>>>>> MemoryRegionIOMMUOps.replay() callback > >>>>>>> > >>>>>>> On Mon, Mar 27, 2017 at 08:35:05AM +0000, Liu, Yi L wrote: > >>>>>>>>> -----Original Message----- > >>>>>>>>> From: Qemu-devel > >>>>>>>>> [mailto:[email protected]] On > >>>>>>>>> Behalf Of Peter Xu > >>>>>>>>> Sent: Tuesday, February 7, 2017 4:28 PM > >>>>>>>>> To: [email protected] > >>>>>>>>> Cc: Lan, Tianyu <[email protected]>; Tian, Kevin > >>>>>>>>> <[email protected]>; [email protected]; > >>>>>>>>> [email protected]; [email protected]; > >>>>>>>>> [email protected]; [email protected]; > >>>>>>>>> [email protected]; David Gibson <[email protected]> > >>>>>>>>> Subject: [Qemu-devel] [PATCH v7 14/17] memory: add > >>>>>>>>> MemoryRegionIOMMUOps.replay() callback > >>>>>>>>> > >>>>>>>>> Originally we have one memory_region_iommu_replay() function, > >>>>>>>>> which is the default behavior to replay the translations of > >>>>>>>>> the whole IOMMU region. However, on some platform like x86, we > >>>>>>>>> may want our own > >>>>>>> replay logic for IOMMU regions. > >>>>>>>>> This patch add one more hook for IOMMUOps for the callback, > >>>>>>>>> and it'll override the default if set. > >>>>>>>>> > >>>>>>>>> Signed-off-by: Peter Xu <[email protected]> > >>>>>>>>> --- > >>>>>>>>> include/exec/memory.h | 2 ++ > >>>>>>>>> memory.c | 6 ++++++ > >>>>>>>>> 2 files changed, 8 insertions(+) > >>>>>>>>> > >>>>>>>>> diff --git a/include/exec/memory.h b/include/exec/memory.h > >>>>>>>>> index > >>>>>>>>> 0767888..30b2a74 100644 > >>>>>>>>> --- a/include/exec/memory.h > >>>>>>>>> +++ b/include/exec/memory.h > >>>>>>>>> @@ -191,6 +191,8 @@ struct MemoryRegionIOMMUOps { > >>>>>>>>> void (*notify_flag_changed)(MemoryRegion *iommu, > >>>>>>>>> IOMMUNotifierFlag old_flags, > >>>>>>>>> IOMMUNotifierFlag > >>>>>>>>> new_flags); > >>>>>>>>> + /* Set this up to provide customized IOMMU replay function */ > >>>>>>>>> + void (*replay)(MemoryRegion *iommu, IOMMUNotifier > >>>>>>>>> + *notifier); > >>>>>>>>> }; > >>>>>>>>> > >>>>>>>>> typedef struct CoalescedMemoryRange CoalescedMemoryRange; > >>>>>>>>> diff --git a/memory.c b/memory.c index 7a4f2f9..9c253cc 100644 > >>>>>>>>> --- a/memory.c > >>>>>>>>> +++ b/memory.c > >>>>>>>>> @@ -1630,6 +1630,12 @@ void > >>>>>>>>> memory_region_iommu_replay(MemoryRegion > >>>>>>>>> *mr, IOMMUNotifier *n, > >>>>>>>>> hwaddr addr, granularity; > >>>>>>>>> IOMMUTLBEntry iotlb; > >>>>>>>>> + /* If the IOMMU has its own replay callback, override */ > >>>>>>>>> + if (mr->iommu_ops->replay) { > >>>>>>>>> + mr->iommu_ops->replay(mr, n); > >>>>>>>>> + return; > >>>>>>>>> + } > >>>>>>>> Hi Alex, Peter, > >>>>>>>> > >>>>>>>> Will all the other vendors(e.g. PPC, s390, ARM) add their own > >>>>>>>> replay callback as well? I guess it depends on whether the > >>>>>>>> original replay algorithm work well for them? Do you have such > knowledge? > >>>>>>> I guess so. At least for VT-d we had this callback since the > >>>>>>> default replay mechanism did not work well on x86 due to its > >>>>>>> extremely large memory region size. Thanks, > >>>>>> thx. that would make sense. > >>>>> Peter, > >>>>> > >>>>> Just come to mind that there may be a corner case here. > >>>>> > >>>>> Intel VT-d actually has a "pt" mode which allows device use > >>>>> physical address even when VT-d is enabled. In kernel, there is a > >> iommu_identity_mapping. > >>>>> If a device is in this map, then it would use "pt" mode. So that > >>>>> IOMMU driver would not build second-level page table for it. > >>>> Yes, but qemu does not support ECAP_PT now, so guest will still > >>>> have a page table in this case. > >>> That's true. Without ECAP_PT, IOMMU driver would create a 1:1 map. > >>> So this solution can work well even a device is in identify_map. > >>> > >>>>> Back to the virtual IOVA implementation, if an assigned device is > >>>>> in the iommu_identity_mapping(e.g. VGA controller), it uses GPA > >>>>> directly to do > >> DMA. > >>>>> So it demands a GPA->HPA mapping in host. However, the > >>>>> iommu->ops.replay is not able to build it when guest SL page table is > >>>>> empty. > >>>>> > >>>>> So I think building an entire guest PA->HPA mapping before guest > >>>>> kernel boot would be recommended. Any thoughts? > >>>> We plan to add PT in 2.10, a possible rough idea is disabled iommu > >>>> dmar region and use another region without iommu_ops. Then > >>>> vfio_listener_region_add() will just do the correct mappings. > >>> Good to know it. Actually, I also need to expose ECAP_PT for vSVM. > >>> So just comes to realize that the current replay solution may not > >>> work well when I > >> expose ECAP_PT to guest. > >>> I also have a rough idea here. The current listener in container > >>> listens to address space named with devfn if virtual VTd is added. > >>> How about adding one more listener to listen memory address space. > >>> So that the > >> listener can build entire guest PA->HPA mapping. > >> > >> This is only needed for PT. So looks like current code is sufficient to do > >> this I think. > >> See the else part of if (memory_region_is_iommu()) of > >> vfio_listener_region_add(). > > Jason, when the listener listen to device address space, the "else > > part" may not work even we set the mr->iommu_ops = NULL. The mr would > > be a non-ram region when the time region_add is called since it is actually > > listen to > changes from device address space. > > > > Regards, > > Yi L > > > > See Peter's patch ("intel_iommu: allow dynamic switch of IOMMU region"). > It has > > + memory_region_init_alias(&vtd_dev_as->sys_alias, OBJECT(s), > + "vtd_sys_alias", get_system_memory(), > + 0, > memory_region_size(get_system_memory())); > > We can enable sys_alias in when PT is used which should work I think.
Great. I think it works. Thx. Regards, Yi L
