On 08.08.25 05:14, Heng Zhou wrote:
> If a amdgpu_bo_va is fpriv->prt_va, the bo of this one is always NULL.
> So, such kind of amdgpu_bo_va should be updated separately before
> amdgpu_vm_handle_moved.
> 
> Signed-off-by: Heng Zhou <[email protected]>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 6 ++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c           | 8 ++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h           | 1 +
>  3 files changed, 15 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 37d8a7034a7e..e795b2970620 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -2970,6 +2970,12 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void 
> *info, struct dma_fence __rcu *
>               struct amdgpu_device *adev = amdgpu_ttm_adev(
>                       peer_vm->root.bo->tbo.bdev);
>  
> +             ret = amdgpu_vm_handle_prt_moved(adev,peer_vm);
> +             if (ret) {
> +                     pr_debug("Memory eviction: handle PRT moved failed. Try 
> again\n");
> +                     goto validate_map_fail;
> +             }
> +
>               ret = amdgpu_vm_handle_moved(adev, peer_vm, &exec.ticket);
>               if (ret) {
>                       pr_debug("Memory eviction: handle moved failed. Try 
> again\n");
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 283dd44f04b0..2c2a93f22ba0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -1569,6 +1569,14 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
>  
>  }
>  
> +int amdgpu_vm_handle_prt_moved(struct amdgpu_device *adev, struct amdgpu_vm 
> *vm)
> +{
> +     struct amdgpu_fpriv *fpriv;
> +
> +     fpriv = container_of(vm, struct amdgpu_fpriv, vm);

That's an absolute no-go from inside the VM code.

You could maybe do that in the KFD code, but Felix and Philip need to decide 
that.

Regards,
Christian.

> +     return amdgpu_vm_bo_update(adev, fpriv->prt_va, false);
> +}
> +
>  /**
>   * amdgpu_vm_handle_moved - handle moved BOs in the PT
>   *
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index f9549f6b3d1f..853a66f9fd2e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -504,6 +504,7 @@ int amdgpu_vm_update_pdes(struct amdgpu_device *adev,
>  int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
>                         struct amdgpu_vm *vm,
>                         struct dma_fence **fence);
> +int amdgpu_vm_handle_prt_moved(struct amdgpu_device *adev, struct amdgpu_vm 
> *vm);
>  int amdgpu_vm_handle_moved(struct amdgpu_device *adev,
>                          struct amdgpu_vm *vm,
>                          struct ww_acquire_ctx *ticket);

Reply via email to