Hi Jason, > Subject: Re: [PATCH v4 1/5] PCI/P2PDMA: Don't enforce ACS check for device > functions of Intel GPUs > > On Fri, Sep 19, 2025 at 06:22:45AM +0000, Kasireddy, Vivek wrote: > > > In this case messing with ACS is completely wrong. If the intention is > > > to convay a some kind of "private" address representing the physical > > > VRAM then you need to use a DMABUF mechanism to do that, not > deliver a > > > P2P address that the other side cannot access. > > > I think using a PCI BAR Address works just fine in this case because the Xe > > driver bound to PF on the Host can easily determine that it belongs to one > > of the VFs and translate it into VRAM Address. > > That isn't how the P2P or ACS mechansim works in Linux, it is about > the actual address used for DMA. Right, but this is not dealing with P2P DMA access between two random, unrelated devices. Instead, this is a special situation involving a GPU PF trying to access the VRAM of a VF that it provisioned and holds a reference on (note that the backing object for VF's VRAM is pinned by Xe on Host as part of resource provisioning). But it gets treated as regular P2P DMA because the exporters rely on pci_p2pdma_distance() or pci_p2pdma_map_type() to determine P2P compatibility.
In other words, I am trying to look at this problem differently: how can the PF be allowed to access the VF's resource that it provisioned, particularly when the VF itself requests the PF to access it and when a hardware path (via PCIe fabric) is not required/supported or doesn't exist at all? Furthermore, note that on a server system with a whitelisted PCIe upstream bridge, this quirk would not be needed at all as pci_p2pdma_map_type() would not have failed and this would have been a purely Xe driver specific problem to solve that would have required just the translation logic and no further changes anywhere. But my goal is to fix it across systems like workstations/desktops that do not typically have whitelisted PCIe upstream bridges. > > You can't translate a dma_addr_t to anything in the Xe PF driver > anyhow, once it goes through the IOMMU the necessary information is lost. Well, I already tested this path (via IOMMU, with your earlier vfio-pci + dmabuf patch that used dma_map_resource() and also with Leon's latest version) and found that I could still do the translation in the Xe PF driver after first calling iommu_iova_to_phys(). > This is a fundamentally broken design to dma map something and > then try to reverse engineer the dma_addr_t back to something with > meaning. IIUC, I don't think this is a new or radical idea. I think the concept is slightly similar to using bounce buffers to address hardware DMA limitations except that there are no memory copies and the CPU is not involved. And, I don't see any other way to do this because I don't believe the exporter can provide a DMA address that the importer can use directly without any translation, which seems unavoidable in this case. > > > > Christian told me dmabuf has such a private address mechanism, so > > > please figure out a way to use it.. > > > > Even if such as a mechanism exists, we still need a way to prevent > > pci_p2pdma_map_type() from failing when invoked by the exporter (vfio- > pci). > > Does it make sense to move this quirk into the exporter? > > When you export a private address through dmabuf the VFIO exporter > will not call p2pdma paths when generating it. I have cc'd Christian and Simona. Hopefully, they can help explain how the dmabuf private address mechanism can be used to address my use-case. And, I sincerely hope that it will work, otherwise I don't see any viable path forward for what I am trying to do other than using this quirk and translation. Note that the main reason why I am doing this is because I am seeing at-least ~35% performance gain when running light 3D/Gfx workloads. > > > Also, AFAICS, translating BAR Address to VRAM Address can only be > > done by the Xe driver bound to PF because it has access to provisioning > > data. In other words, vfio-pci would not be able to share any other > > address other than the BAR Address because it wouldn't know how to > > translate it to VRAM Address. > > If you have a vfio varient driver then the VF vfio driver could call > the Xe driver to create a suitable dmabuf using the private > addressing. This is probably what is required here if this is what you > are trying to do. Could this not be done via the vendor agnostic vfio-pci (+ dmabuf) driver instead of having to use a separate VF/vfio variant driver? > > > > No, don't, it is completely wrong to mess with ACS flags for the > > > problem you are trying to solve. > > > But I am not messing with any ACS flags here. I am just adding a quirk to > > sidestep the ACS enforcement check given that the PF to VF access does > > not involve the PCIe fabric in this case. > > Which is completely wrong. These are all based on fabric capability, > not based on code in drivers to wrongly "translate" the dma_addr_t. I am not sure why you consider translation to be wrong in this case given that it is done by a trusted entity (Xe PF driver) that is bound to the GPU PF and provisioned the resource that it is trying to access. What limitations do you see with this approach? Also, the quirk being added in this patch is indeed meant to address a specific case (GPU PF to VF access) to workaround a potential hardware limitation (non-existence of a direct PF to VF DMA access path via the PCIe fabric). Isn't that one of the main ideas behind using quirks -- to address hardware limitations? Thanks, Vivek > > Jason
