Am 17.04.25 um 15:33 schrieb Felix Kuehling: > Pinning of VRAM is for peer devices that don't support dynamic attachment > and move notifiers. But it requires that all such peer devices are able to > access VRAM via PCIe P2P. Any device without P2P access requires migration > to GTT, which fails if the memory is already pinned for another peer > device. > > Sharing between GPUs should not require pinning in VRAM. However, if > DMABUF_MOVE_NOTIFY is disabled in the kernel build, even DMABufs shared > between GPUs must be pinned, which can lead to failures and functional > regressions on systems where some peer GPUs are not P2P accessible. > > Disable VRAM pinning if move notifiers are disabled in the kernel build > to fix regressions when sharing BOs between GPUs. > > v2: ensure domains is not 0, add GTT if necessary > > Signed-off-by: Felix Kuehling <[email protected]> > Tested-by: Hao (Claire) Zhou <[email protected]> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 19 ++++++++++++++----- > 1 file changed, 14 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c > index 667080cc9ae1..3c2c32a252d4 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c > @@ -81,17 +81,26 @@ static int amdgpu_dma_buf_pin(struct dma_buf_attachment > *attach) > > dma_resv_assert_held(dmabuf->resv); > > - /* > - * Try pinning into VRAM to allow P2P with RDMA NICs without ODP > + /* Try pinning into VRAM to allow P2P with RDMA NICs without ODP > * support if all attachments can do P2P. If any attachment can't do > * P2P just pin into GTT instead. > + * > + * To avoid with conflicting pinnings between GPUs and RDMA when move > + * notifiers are disabled, only allow pinning in VRAM when move > + * notiers are enabled. > */ > - list_for_each_entry(attach, &dmabuf->attachments, node) > - if (!attach->peer2peer) > - domains &= ~AMDGPU_GEM_DOMAIN_VRAM; > + if (!IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY)) { > + domains &= ~AMDGPU_GEM_DOMAIN_VRAM; > + } else { > + list_for_each_entry(attach, &dmabuf->attachments, node) > + if (!attach->peer2peer) > + domains &= ~AMDGPU_GEM_DOMAIN_VRAM; > + } > > if (domains & AMDGPU_GEM_DOMAIN_VRAM) > bo->flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED; > + else if (!domains) > + domains = AMDGPU_GEM_DOMAIN_GTT;
Please drop that. We should probably use allowed_domains instead of preferred_domains and return an error if none of them work. Regards, Christian. > > return amdgpu_bo_pin(bo, domains); > }
