Re: [PATCH v1 2/4] nouveau/dmem: HMM P2P DMA for private dev pages

2024-10-15 Thread Alistair Popple
Yonatan Maman writes: > From: Yonatan Maman > > Enabling Peer-to-Peer DMA (P2P DMA) access in GPU-centric applications > is crucial for minimizing data transfer overhead (e.g., for RDMA use- > case). > > This change aims to enable that capability for Nouveau over HMM device > private pages. P2

Re: [PATCH v1 1/4] mm/hmm: HMM API for P2P DMA to device zone pages

2024-10-15 Thread Alistair Popple
Yonatan Maman writes: > From: Yonatan Maman > > hmm_range_fault() natively triggers a page fault on device private > pages, migrating them to RAM. In some cases, such as with RDMA devices, > the migration overhead between the device (e.g., GPU) and the CPU, and > vice-versa, significantly dama

Re: [PATCH v1 1/4] mm/hmm: HMM API for P2P DMA to device zone pages

2024-10-15 Thread Christoph Hellwig
The subject does not make sense. All P2P is on ZONE_DEVICE pages. It seems like this is about device private memory? On Tue, Oct 15, 2024 at 06:23:45PM +0300, Yonatan Maman wrote: > From: Yonatan Maman > > hmm_range_fault() natively triggers a page fault on device private > pages, migrating the

Re: [PATCH v1 0/4] GPU Direct RDMA (P2P DMA) for Device Private Pages

2024-10-15 Thread Christoph Hellwig
On Tue, Oct 15, 2024 at 06:23:44PM +0300, Yonatan Maman wrote: > From: Yonatan Maman > > This patch series aims to enable Peer-to-Peer (P2P) DMA access in > GPU-centric applications that utilize RDMA and private device pages. This > enhancement is crucial for minimizing data transfer overhead by

Re: [REGRESSION] GM20B pmu timeout

2024-10-15 Thread Linux regression tracking (Thorsten Leemhuis)
Hi, Thorsten here, the Linux kernel's regression tracker. On 10.10.24 15:32, Diogo Ivo wrote: > > Somewhere between 6.11-rc4 and 6.11-rc5 the following error message is > displayed > when trying to initialize a nvc0_screen on the Tegra X1's GM20B: > > [ 34.431210] nouveau 5700.gpu: pmu:hpq:

Re: [RFC 04/29] nvkm/vgpu: set the VF partition count when NVIDIA vGPU is enabled

2024-10-15 Thread Jason Gunthorpe
On Tue, Oct 15, 2024 at 03:19:33PM +, Zhi Wang wrote: > The FW needs to pre-calculate the reserved video memory for its own use, > which includes the size of metadata of max-supported vGPUs. It needs to > be decided at the FW loading time. We can always set it to the max > number and the tr

[PATCH v1 4/4] RDMA/mlx5: Enabling ATS for ODP memory

2024-10-15 Thread Yonatan Maman
From: Yonatan Maman ATS (Address Translation Services) mainly utilized to optimize PCI Peer-to-Peer transfers and prevent bus failures. This change employed ATS usage for ODP memory, to optimize DMA P2P for ODP memory. (e.g DMA P2P for private device pages - ODP memory). Signed-off-by: Yonatan M

[PATCH v1 3/4] IB/core: P2P DMA for device private pages

2024-10-15 Thread Yonatan Maman
From: Yonatan Maman Add Peer-to-Peer (P2P) DMA request for hmm_range_fault calling, utilizing capabilities introduced in mm/hmm. By setting range.default_flags to HMM_PFN_REQ_FAULT | HMM_PFN_REQ_TRY_P2P, HMM attempts to initiate P2P DMA connections for device private pages (instead of page fault

[PATCH v1 2/4] nouveau/dmem: HMM P2P DMA for private dev pages

2024-10-15 Thread Yonatan Maman
From: Yonatan Maman Enabling Peer-to-Peer DMA (P2P DMA) access in GPU-centric applications is crucial for minimizing data transfer overhead (e.g., for RDMA use- case). This change aims to enable that capability for Nouveau over HMM device private pages. P2P DMA for private device pages allows th

[PATCH v1 1/4] mm/hmm: HMM API for P2P DMA to device zone pages

2024-10-15 Thread Yonatan Maman
From: Yonatan Maman hmm_range_fault() natively triggers a page fault on device private pages, migrating them to RAM. In some cases, such as with RDMA devices, the migration overhead between the device (e.g., GPU) and the CPU, and vice-versa, significantly damages performance. Thus, enabling Peer-

[PATCH v1 0/4] GPU Direct RDMA (P2P DMA) for Device Private Pages

2024-10-15 Thread Yonatan Maman
From: Yonatan Maman This patch series aims to enable Peer-to-Peer (P2P) DMA access in GPU-centric applications that utilize RDMA and private device pages. This enhancement is crucial for minimizing data transfer overhead by allowing the GPU to directly expose device private page data to devices s

Re: [RFC 04/29] nvkm/vgpu: set the VF partition count when NVIDIA vGPU is enabled

2024-10-15 Thread Zhi Wang
On 15/10/2024 15.20, Jason Gunthorpe wrote: > On Sun, Oct 13, 2024 at 06:54:32PM +, Zhi Wang wrote: >> On 27/09/2024 1.51, Jason Gunthorpe wrote: >>> On Sun, Sep 22, 2024 at 05:49:26AM -0700, Zhi Wang wrote: GSP firmware needs to know the number of max-supported vGPUs when initializat

Re: [RFC 18/29] nvkm/vgpu: introduce pci_driver.sriov_configure() in nvkm

2024-10-15 Thread Zhi Wang
On 15/10/2024 15.27, Jason Gunthorpe wrote: > On Mon, Oct 14, 2024 at 08:32:03AM +, Zhi Wang wrote: > >> Turning on the SRIOV feature is just a part of the process enabling a >> vGPU. The VF is not instantly usable before a vGPU type is chosen via >> another userspace interface (e.g. fwctl). >

Re: [RFC 18/29] nvkm/vgpu: introduce pci_driver.sriov_configure() in nvkm

2024-10-15 Thread Jason Gunthorpe
On Mon, Oct 14, 2024 at 08:32:03AM +, Zhi Wang wrote: > Turning on the SRIOV feature is just a part of the process enabling a > vGPU. The VF is not instantly usable before a vGPU type is chosen via > another userspace interface (e.g. fwctl). That's OK, that has become pretty normal now that

Re: [RFC 06/29] nvkm/vgpu: set RMSetSriovMode when NVIDIA vGPU is enabled

2024-10-15 Thread Jason Gunthorpe
On Mon, Oct 14, 2024 at 07:38:03AM +, Zhi Wang wrote: > On 27/09/2024 1.53, Jason Gunthorpe wrote: > > On Sun, Sep 22, 2024 at 05:49:28AM -0700, Zhi Wang wrote: > >> The registry object "RMSetSriovMode" is required to be set when vGPU is > >> enabled. > >> > >> Set "RMSetSriovMode" to 1 when nv

Re: [RFC 04/29] nvkm/vgpu: set the VF partition count when NVIDIA vGPU is enabled

2024-10-15 Thread Jason Gunthorpe
On Sun, Oct 13, 2024 at 06:54:32PM +, Zhi Wang wrote: > On 27/09/2024 1.51, Jason Gunthorpe wrote: > > On Sun, Sep 22, 2024 at 05:49:26AM -0700, Zhi Wang wrote: > >> GSP firmware needs to know the number of max-supported vGPUs when > >> initialization. > >> > >> The field of VF partition count