On Fri, 2022-03-18 at 14:27 -0300, Jason Gunthorpe wrote:
> The top of the data structure provides an IO Address Space (IOAS) that is
> similar to a VFIO container. The IOAS allows map/unmap of memory into
> ranges of IOVA called iopt_areas. Domains and in-kernel users (like VFIO
> mdevs) can be attached to the IOAS to access the PFNs that those IOVA
> areas cover.
>
> The IO Address Space (IOAS) datastructure is composed of:
> - struct io_pagetable holding the IOVA map
> - struct iopt_areas representing populated portions of IOVA
> - struct iopt_pages representing the storage of PFNs
> - struct iommu_domain representing the IO page table in the system IOMMU
> - struct iopt_pages_user representing in-kernel users of PFNs (ie VFIO
> mdevs)
> - struct xarray pinned_pfns holding a list of pages pinned by in-kernel
> users
>
> This patch introduces the lowest part of the datastructure - the movement
> of PFNs in a tiered storage scheme:
> 1) iopt_pages::pinned_pfns xarray
> 2) An iommu_domain
> 3) The origin of the PFNs, i.e. the userspace pointer
>
> PFN have to be copied between all combinations of tiers, depending on the
> configuration.
>
> The interface is an iterator called a 'pfn_reader' which determines which
> tier each PFN is stored and loads it into a list of PFNs held in a struct
> pfn_batch.
>
> Each step of the iterator will fill up the pfn_batch, then the caller can
> use the pfn_batch to send the PFNs to the required destination. Repeating
> this loop will read all the PFNs in an IOVA range.
>
> The pfn_reader and pfn_batch also keep track of the pinned page accounting.
>
> While PFNs are always stored and accessed as full PAGE_SIZE units the
> iommu_domain tier can store with a sub-page offset/length to support
> IOMMUs with a smaller IOPTE size than PAGE_SIZE.
>
> Signed-off-by: Jason Gunthorpe <[email protected]>
> ---
> drivers/iommu/iommufd/Makefile | 3 +-
> drivers/iommu/iommufd/io_pagetable.h | 101 ++++
> drivers/iommu/iommufd/iommufd_private.h | 20 +
> drivers/iommu/iommufd/pages.c | 723 ++++++++++++++++++++++++
> 4 files changed, 846 insertions(+), 1 deletion(-)
> create mode 100644 drivers/iommu/iommufd/io_pagetable.h
> create mode 100644 drivers/iommu/iommufd/pages.c
>
>
---8<---
> +
> +/*
> + * This holds a pinned page list for multiple areas of IO address space. The
> + * pages always originate from a linear chunk of userspace VA. Multiple
> + * io_pagetable's, through their iopt_area's, can share a single iopt_pages
> + * which avoids multi-pinning and double accounting of page consumption.
> + *
> + * indexes in this structure are measured in PAGE_SIZE units, are 0 based
> from
> + * the start of the uptr and extend to npages. pages are pinned dynamically
> + * according to the intervals in the users_itree and domains_itree, npages
> + * records the current number of pages pinned.
This sounds wrong or at least badly named. If npages records the
current number of pages pinned then what does npinned record?
> + */
> +struct iopt_pages {
> + struct kref kref;
> + struct mutex mutex;
> + size_t npages;
> + size_t npinned;
> + size_t last_npinned;
> + struct task_struct *source_task;
> + struct mm_struct *source_mm;
> + struct user_struct *source_user;
> + void __user *uptr;
> + bool writable:1;
> + bool has_cap_ipc_lock:1;
> +
> + struct xarray pinned_pfns;
> + /* Of iopt_pages_user::node */
> + struct rb_root_cached users_itree;
> + /* Of iopt_area::pages_node */
> + struct rb_root_cached domains_itree;
> +};
> +
---8<---
_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu