Am 17.04.25 um 15:33 schrieb Felix Kuehling:
> Pinning of VRAM is for peer devices that don't support dynamic attachment
> and move notifiers. But it requires that all such peer devices are able to
> access VRAM via PCIe P2P. Any device without P2P access requires migration
> to GTT, which fails if the memory is already pinned for another peer
> device.
>
> Sharing between GPUs should not require pinning in VRAM. However, if
> DMABUF_MOVE_NOTIFY is disabled in the kernel build, even DMABufs shared
> between GPUs must be pinned, which can lead to failures and functional
> regressions on systems where some peer GPUs are not P2P accessible.
>
> Disable VRAM pinning if move notifiers are disabled in the kernel build
> to fix regressions when sharing BOs between GPUs.
>
> v2: ensure domains is not 0, add GTT if necessary
>
> Signed-off-by: Felix Kuehling <[email protected]>
> Tested-by: Hao (Claire) Zhou <[email protected]>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> index 667080cc9ae1..3c2c32a252d4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> @@ -81,17 +81,26 @@ static int amdgpu_dma_buf_pin(struct dma_buf_attachment 
> *attach)
>  
>       dma_resv_assert_held(dmabuf->resv);
>  
> -     /*
> -      * Try pinning into VRAM to allow P2P with RDMA NICs without ODP
> +     /* Try pinning into VRAM to allow P2P with RDMA NICs without ODP
>        * support if all attachments can do P2P. If any attachment can't do
>        * P2P just pin into GTT instead.
> +      *
> +      * To avoid with conflicting pinnings between GPUs and RDMA when move
> +      * notifiers are disabled, only allow pinning in VRAM when move
> +      * notiers are enabled.
>        */
> -     list_for_each_entry(attach, &dmabuf->attachments, node)
> -             if (!attach->peer2peer)
> -                     domains &= ~AMDGPU_GEM_DOMAIN_VRAM;
> +     if (!IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY)) {
> +             domains &= ~AMDGPU_GEM_DOMAIN_VRAM;
> +     } else {
> +             list_for_each_entry(attach, &dmabuf->attachments, node)
> +                     if (!attach->peer2peer)
> +                             domains &= ~AMDGPU_GEM_DOMAIN_VRAM;
> +     }
>  
>       if (domains & AMDGPU_GEM_DOMAIN_VRAM)
>               bo->flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
> +     else if (!domains)
> +             domains = AMDGPU_GEM_DOMAIN_GTT;

Please drop that.

We should probably use allowed_domains instead of preferred_domains and return 
an error if none of them work.

Regards,
Christian.

>  
>       return amdgpu_bo_pin(bo, domains);
>  }

Reply via email to