This series adds TLP Processing Hints (TPH) support to the VFIO dma-buf
export path, allowing importing drivers (e.g. mlx5) to use the
exporter's steering tag when performing peer-to-peer DMA into a
VFIO-owned device.
There is no separate in-tree vendor kernel driver for the target device:
vfio-pci is the in-tree driver and the targeted device is managed
from userspace via VFIO passthrough. That is why the ST has to flow
through a uAPI: userspace owns the device and its ST table, so it is the
entity that can configure a meaningful value for a given dma-buf. The
kernel-visible participants are still in-tree: vfio-pci exports the
dma-buf and mlx5 imports it.
On the effect: the endpoint's PCIe ingress block uses the ST as
an in-band instruction for the incoming P2P TLP -- selecting a target
cache partition and, on writes, an in-flight operation on the data
before it lands. The dma-buf callback keeps this opaque to the
framework -- only the producer (userspace owner of the VFIO device)
and the consumer (endpoint block) need to interpret the value. The
dma-buf get_pci_tph callback itself is optional, but workloads that
depend on the endpoint's in-flight operation need it because fallback
does not produce the same result.
The dma-buf hook is intentionally generic and discoverable rather than
a private side channel. The exporter owns the completing address
space for the dma-buf and decides whether it can provide a meaningful
ST/PH tuple for that completer; the dma-buf core keeps the tuple opaque,
and importers merely request the namespace they support and place the
returned value on generated TLPs. Exporters that cannot derive a
meaningful tuple simply return -EOPNOTSUPP.
Patch 1 adds small PCI/TPH type helpers so drivers can query the enabled
TPH requester mode and the device's TPH Completer Supported field
without reaching into pci_dev internals (and so callers in
CONFIG_PCIE_TPH=n builds get a clean fallback).
Patch 2 adds the optional dma_buf_ops::get_pci_tph callback plus the
dma_buf_get_pci_tph() importer wrapper so importers can fetch TPH
metadata from an exporter under dmabuf->resv.
Patch 3 implements get_pci_tph in vfio-pci and adds the new uAPI
(VFIO_DEVICE_FEATURE_DMA_BUF_TPH) for userspace to attach the metadata.
Patch 4 wires up the mlx5 RDMA driver as a consumer.
Build-tested with both CONFIG_PCIE_TPH=y and CONFIG_PCIE_TPH=n.
Functional validation on the target topology: PCIe analyzer captures
on the P2P TLPs confirm the ST emitted by mlx5 matches the value
configured through VFIO_DEVICE_FEATURE_DMA_BUF_TPH, and the end-to-end
P2P workload only produces results consistent with the endpoint's
ST-selected in-flight operation. For example, with userspace
configuring 8-bit ST=0xf0 and PH=2, an analyzer capture of a peer-to-
peer MWr64 shows "STP MWr64 TC=0 OHC=2 ..." followed by "OHC-B
ST=F0h PH=2 HV=1":
(TLP Captures)
08000260 -> STP MWr64 TC=0 OHC=2 TS=0 Attr=0 L=8
F0000004 -> RID=4h:0h.0h EP- Tag=F0h
E0200000 -> AddrH=000020E0h
00080006 -> AddrL=06000800h
90F00000 -> OHC-B ST=F0h PH=2 HV=1 AMA=0 AV-
Depends on (submitted separately):
net/mlx5: free mlx5_st_idx_data on final dealloc
https://lore.kernel.org/linux-rdma/[email protected]
Changes since v9:
Patch 3 (vfio/pci): Address Alex Williamson's comments by annotating
the existing unlocked @revoked read with READ_ONCE() and rewriting
the DMA_BUF_TPH uAPI text around @flags, future-query semantics,
@ph, and undefined bits. No behavior change.
Patch 4 (RDMA/mlx5): Address Michael Gur's comments by renaming the
per-MR ST ref helpers to drop the misleading "frmr" infix and by
preventing mlx5r_build_frmr_key() from propagating user-provided
kernel_vendor_key. Also fix PH encoding consistency between FRMR and
reg_create() and balance the MR-scoped ST ref across both creation
paths and failures.
Previous link:
v9:
https://lore.kernel.org/dri-devel/[email protected]/
v8:
https://lore.kernel.org/dri-devel/[email protected]/
v7:
https://lore.kernel.org/dri-devel/[email protected]/
v6:
https://lore.kernel.org/dri-devel/[email protected]/
v5:
https://lore.kernel.org/dri-devel/[email protected]/
v4:
https://lore.kernel.org/linux-pci/[email protected]/
v3:
https://lore.kernel.org/linux-pci/[email protected]/
v2: https://lore.kernel.org/linux-pci/[email protected]/
Zhiping Zhang (4):
PCI/TPH: Add requester/completer type helpers
dma-buf: add optional get_pci_tph() callback
vfio/pci: implement get_pci_tph and DMA_BUF_TPH feature
RDMA/mlx5: get tph for p2p access when registering dma-buf mr
drivers/dma-buf/dma-buf.c | 25 ++++
drivers/infiniband/hw/mlx5/main.c | 1 +
drivers/infiniband/hw/mlx5/mr.c | 116 +++++++++++++++++-
.../net/ethernet/mellanox/mlx5/core/lib/st.c | 49 ++++++--
drivers/pci/tph.c | 45 +++++++
drivers/vfio/pci/vfio_pci_core.c | 3 +
drivers/vfio/pci/vfio_pci_dmabuf.c | 99 ++++++++++++++-
drivers/vfio/pci/vfio_pci_priv.h | 12 ++
include/linux/dma-buf.h | 22 ++++
include/linux/mlx5/driver.h | 15 +++
include/linux/pci-tph.h | 8 ++
include/uapi/linux/vfio.h | 43 +++++++
12 files changed, 422 insertions(+), 16 deletions(-)
base-commit: 97d2a397efe7752ebf9204a1cfd365afd80c3b28
--
2.53.0-Meta