> -----Original Message-----
> From: Jan Beulich [mailto:[email protected]]
> Sent: 04 December 2018 15:17
> To: Paul Durrant <[email protected]>
> Cc: Brian Woods <[email protected]>; Suravee Suthikulpanit
> <[email protected]>; Julien Grall <[email protected]>;
> Andrew Cooper <[email protected]>; Roger Pau Monne
> <[email protected]>; Wei Liu <[email protected]>; George Dunlap
> <[email protected]>; Ian Jackson <[email protected]>; Kevin
> Tian <[email protected]>; Stefano Stabellini <[email protected]>;
> xen-devel <[email protected]>; Konrad Rzeszutek Wilk
> <[email protected]>; Tim (Xen.org) <[email protected]>
> Subject: Re: [PATCH v2 3/4] iommu: elide flushing for higher order
> map/unmap operations
>
> >>> On 03.12.18 at 18:40, <[email protected]> wrote:
> > This patch removes any implicit flushing that occurs in the
> implementation
> > of map and unmap operations and adds new iommu_map/unmap() wrapper
> > functions. To maintain sematics of the iommu_legacy_map/unmap() wrapper
> > functions, these are modified to call the new wrapper functions and then
> > perform an explicit flush operation.
> >
> > Because VT-d currently performs two different types of flush dependent
> upon
> > whether a PTE is being modified versus merely added (i.e. replacing a
> non-
> > present PTE) 'iommu flush flags' are defined by this patch and the
> > iommu_ops map_page() and unmap_page() methods are modified to OR the
> type
> > of flush necessary for the PTE that has been populated or depopulated
> into
> > an accumulated flags value. The accumulated value can then be passed
> into
> > the explicit flush operation.
> >
> > The ARM SMMU implementations of map_page() and unmap_page() currently
> > perform no implicit flushing and therefore the modified methods do not
> > adjust the flush flags.
>
> Which, however, likely is wrong. If we mean the flushing to be initiated
> by the arch- and vendor-independent wrappers, then all map/unmap
> backends should indicate the needed kind of flush. Granted this can be
> done later, if things are otherwise correct on Arm right now.
>
> > --- a/xen/arch/arm/p2m.c
> > +++ b/xen/arch/arm/p2m.c
> > @@ -971,8 +971,17 @@ static int __p2m_set_entry(struct p2m_domain *p2m,
> >
> > if ( need_iommu_pt_sync(p2m->domain) &&
> > (lpae_is_valid(orig_pte) || lpae_is_valid(*entry)) )
> > + {
> > + unsigned int flush_flags = 0;
> > +
> > + if ( lpae_is_valid(orig_pte) )
> > + flush_flags |= IOMMU_FLUSHF_modified;
> > + if ( lpae_is_valid(*entry) )
> > + flush_flags |= IOMMU_FLUSHF_added;
>
> Shouldn't this be "else if" with the meaning assigned to both
> types? From an abstract perspective I'd also expect that for
> a single mapping no more than one of the flags can come
> back set (through the iommu_ops interface).
That's not how I see it. My rationale is:
- present PTE made non-present, or modified -> IOMMU_FLUSHF_modified
- new PTE value is present -> IOMMU_FLUSHF_added
So, a single op can set any combination of bits and thus the above code does
not use 'else if'.
>
> > @@ -84,7 +86,7 @@ static bool set_iommu_pde_present(uint32_t *pde,
> unsigned long next_mfn,
> >
> > if ( maddr_old != maddr_next || iw != old_w || ir != old_r ||
> > old_level != next_level )
> > - need_flush = true;
> > + flush_flags = IOMMU_FLUSHF_modified;
>
> Why uniformly "modified"?
Because the AMD IOMMU does require flushing for a non-present -> present
transition AFAICT. The old code certainly implies this.
>
> > @@ -645,11 +648,13 @@ static unsigned long flush_count(unsigned long
> dfn, unsigned int page_count,
> > }
> >
> > int amd_iommu_flush_iotlb_pages(struct domain *d, dfn_t dfn,
> > - unsigned int page_count)
> > + unsigned int page_count,
> > + unsigned int flush_flags)
> > {
> > unsigned long dfn_l = dfn_x(dfn);
> >
> > ASSERT(page_count && !dfn_eq(dfn, INVALID_DFN));
> > + ASSERT(flush_flags & IOMMU_FLUSHF_modified);
>
> Is this valid? What if a map operation solely re-established what
> was already there? Aiui in that case set_iommu_pde_present()
> would always return zero. Or take this (seeing that the generic
> wrapper has a zero check for the flush flags):
Yes, the ASSERT is there because this should never be called unless flush_flags
!= 0 (ensured by the wrapper) and the map code should only ever set
IOMMU_FLUSHF_modified.
>
> > @@ -692,6 +697,7 @@ int amd_iommu_reserve_domain_unity_map(struct domain
> *domain,
> > unsigned long npages, i;
> > unsigned long gfn;
> > unsigned int flags = !!ir;
> > + unsigned int flush_flags = 0;
> > int rt = 0;
> >
> > if ( iw )
> > @@ -703,11 +709,21 @@ int amd_iommu_reserve_domain_unity_map(struct
> domain *domain,
> > {
> > unsigned long frame = gfn + i;
> >
> > - rt = amd_iommu_map_page(domain, _dfn(frame), _mfn(frame),
> flags);
> > + rt = amd_iommu_map_page(domain, _dfn(frame), _mfn(frame),
> flags,
> > + &flush_flags);
> > if ( rt != 0 )
> > - return rt;
> > + break;
> > }
> > - return 0;
> > +
> > + /*
> > + * The underlying implementation is void so the return value is
> > + * meaningless and can hence be ignored.
> > + */
> > + while ( amd_iommu_flush_iotlb_pages(domain, _dfn(gfn), npages,
> > + flush_flags) )
> > + break;
>
> Nothing here guarantees flush_flags to be non-zero.
Good point. I'll add a check.
>
> > @@ -235,6 +236,9 @@ void __hwdom_init iommu_hwdom_init(struct domain *d)
> > process_pending_softirqs();
> > }
> >
> > + while ( !flush_flags && iommu_flush_all(d) )
> > + break;
>
> Is there a comment missing here explaining the seemingly odd
> loop?
I'm merely using the construct you suggested, but I can add a comment.
>
> > @@ -381,6 +402,17 @@ int iommu_legacy_unmap(struct domain *d, dfn_t dfn,
> unsigned int page_order)
> > return rc;
> > }
> >
> > +int iommu_legacy_unmap(struct domain *d, dfn_t dfn, unsigned int
> page_order)
> > +{
> > + unsigned int flush_flags = 0;
> > + int rc = iommu_unmap(d, dfn, page_order, &flush_flags);
> > +
> > + if ( !rc )
> > + rc = iommu_flush(d, dfn, (1u << page_order), flush_flags);
>
> No iommu_dont_flush_iotlb check needed here?
I thought the old VT-d unmap code ignored it, but I see it didn't so yes I do
need to add the check.
>
> > --- a/xen/drivers/passthrough/vtd/iommu.c
> > +++ b/xen/drivers/passthrough/vtd/iommu.c
> > @@ -633,11 +633,14 @@ static int __must_check iommu_flush_iotlb(struct
> domain *d, dfn_t dfn,
> >
> > static int __must_check iommu_flush_iotlb_pages(struct domain *d,
> > dfn_t dfn,
> > - unsigned int
> page_count)
> > + unsigned int
> page_count,
> > + unsigned int
> flush_flags)
> > {
> > ASSERT(page_count && !dfn_eq(dfn, INVALID_DFN));
> > + ASSERT(flush_flags);
> >
> > - return iommu_flush_iotlb(d, dfn, 1, page_count);
> > + return iommu_flush_iotlb(d, dfn, flush_flags &
> IOMMU_FLUSHF_modified,
> > + page_count);
>
> Why the restriction to "modified"?
The parameter is a bool which should be true if an existing PTE was modified or
false otherwise. I can make this !!(flush_flags & IOMMU_FLUSHF_modified) is you
prefer.
>
> > @@ -1825,15 +1825,18 @@ static int __must_check
> intel_iommu_map_page(struct domain *d,
> > spin_unlock(&hd->arch.mapping_lock);
> > unmap_vtd_domain_page(page);
> >
> > - if ( !this_cpu(iommu_dont_flush_iotlb) )
> > - rc = iommu_flush_iotlb(d, dfn, dma_pte_present(old), 1);
> > + *flush_flags |= IOMMU_FLUSHF_added;
> > + if ( dma_pte_present(old) )
> > + *flush_flags |= IOMMU_FLUSHF_modified;
>
> See my earlier comment as to only one of them to get set for an
> individual mapping.
>
> > @@ -62,14 +61,15 @@ int arch_iommu_populate_page_table(struct domain *d)
> > {
> > unsigned long mfn = mfn_x(page_to_mfn(page));
> > unsigned long gfn = mfn_to_gmfn(d, mfn);
> > + unsigned int flush_flags = 0;
> >
> > if ( gfn != gfn_x(INVALID_GFN) )
> > {
> > ASSERT(!(gfn >> DEFAULT_DOMAIN_ADDRESS_WIDTH));
> > BUG_ON(SHARED_M2P(gfn));
> > - rc = iommu_legacy_map(d, _dfn(gfn), _mfn(mfn),
> > - PAGE_ORDER_4K, IOMMUF_readable |
> > - IOMMUF_writable);
> > + rc = iommu_map(d, _dfn(gfn), _mfn(mfn),
> > + PAGE_ORDER_4K, IOMMUF_readable |
> > + IOMMUF_writable, &flush_flags);
> > }
> > if ( rc )
> > {
> > @@ -103,7 +103,6 @@ int arch_iommu_populate_page_table(struct domain *d)
> > }
> >
> > spin_unlock(&d->page_alloc_lock);
> > - this_cpu(iommu_dont_flush_iotlb) = 0;
> >
> > if ( !rc )
> > rc = iommu_flush_all(d);
>
> Would be nice to have a comment here clarifying why flush_flags
> doesn't get used.
Ok.
>
> > @@ -249,6 +251,9 @@ void __hwdom_init arch_iommu_hwdom_init(struct
> domain *d)
> > if (!(i & 0xfffff))
> > process_pending_softirqs();
> > }
> > +
> > + if ( !flush_flags && iommu_flush_all(d) )
> > + return;
> > }
>
> Again please attach a brief comment explaining the seemingly
> strange construct.
>
Ok.
> > --- a/xen/include/xen/iommu.h
> > +++ b/xen/include/xen/iommu.h
> > @@ -93,6 +93,22 @@ void iommu_teardown(struct domain *d);
> > #define _IOMMUF_writable 1
> > #define IOMMUF_writable (1u<<_IOMMUF_writable)
> >
> > +enum
> > +{
>
> Brace on the same line as "enum" please, just like for struct/union. When
> they're named this helps finding the place where a certain type gets
> (fully) declared.
Ok.
Paul
>
> Jan
_______________________________________________
Xen-devel mailing list
[email protected]
https://lists.xenproject.org/mailman/listinfo/xen-devel