On Wed, Sep 17, 2025 at 10:37 PM Liang, Prike <prike.li...@amd.com> wrote: > > [Public] > > Regards, > Prike > > > -----Original Message----- > > From: Alex Deucher <alexdeuc...@gmail.com> > > Sent: Wednesday, September 17, 2025 10:10 PM > > To: Liang, Prike <prike.li...@amd.com> > > Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander > > <alexander.deuc...@amd.com>; Koenig, Christian <christian.koe...@amd.com> > > Subject: Re: [PATCH v2 9/9] drm/amdgpu: validate userq va for GEM unmap > > > > On Fri, Sep 12, 2025 at 2:04 AM Prike Liang <prike.li...@amd.com> wrote: > > > > > > When an user unmaps a userq VA, the driver must ensure the queue has > > > no in-flight jobs. If there is pending work, the kernel should wait > > > for the attached eviction (bookkeeping) fence to signal before > > > deleting the mapping. > > > > > > Suggested-by: Christian König <christian.koe...@amd.com> > > > Signed-off-by: Prike Liang <prike.li...@amd.com> > > > --- > > > drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 29 > > > +++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h | > > 3 +++ > > > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 11 +++++++++ > > > 3 files changed, 43 insertions(+) > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c > > > index a13f186f0a8a..e14dcdcfe36e 100644 > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c > > > @@ -1182,3 +1182,32 @@ int > > amdgpu_userq_start_sched_for_enforce_isolation(struct amdgpu_device *adev, > > > mutex_unlock(&adev->userq_mutex); > > > return ret; > > > } > > > + > > > +int amdgpu_userq_gem_va_unmap_validate(struct amdgpu_device *adev, > > > + struct amdgpu_bo_va_mapping > > > *mapping, > > > + uint64_t saddr) { > > > + u32 ip_mask = amdgpu_userq_get_supported_ip_mask(adev); > > > + struct amdgpu_bo_va *bo_va = mapping->bo_va; > > > + struct dma_resv *resv = bo_va->base.bo->tbo.base.resv; > > > + int ret; > > > + > > > + if (!ip_mask) > > > + return 0; > > > + > > > + dev_warn(adev->dev, "now unmapping a vital queue va:%llx\n", > > > + saddr); > > > > dev_warn_once() so we don't spam the logs. > It seems unlikely to run into this case, but I'm okay to change this to > dev_warn_once() or dev_warn_ratelimited().
Unlikely for a proper application, but you could write a bad application which would cause the kernel log to get spammed. Alex > > > > > + /** > > > + * The userq VA mapping reservation should include the eviction > > > fence, if the > > eviction fence > > > + * can't signal successfully during unmapping, then driver will > > > warn to flag > > this improper unmap > > > + * of the userq VA. > > > + * Note: The eviction fence may be attached to different BOs, and > > > this unmap > > is only for one kind > > > + * of userq VAs, so at this point suppose the eviction fence is > > > always > > unsignaled. > > > + */ > > > + if (!dma_resv_test_signaled(resv, DMA_RESV_USAGE_BOOKKEEP)) { > > > + ret = dma_resv_wait_timeout(resv, DMA_RESV_USAGE_BOOKKEEP, > > true, MAX_SCHEDULE_TIMEOUT); > > > + if (ret <= 0) > > > + return -EBUSY; > > > + } > > > + > > > + return 0; > > > +} > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h > > > index 8cd307be7256..c9a41876f10e 100644 > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h > > > @@ -157,4 +157,7 @@ int amdgpu_userq_buffer_va_put(struct amdgpu_vm > > > *vm, u64 addr); int amdgpu_userq_buffer_vas_put(struct amdgpu_device > > > *adev, > > > struct amdgpu_usermode_queue *queue); > > > bool amdgpu_userq_buffer_vas_mapped(struct amdgpu_usermode_queue > > > *queue); > > > +int amdgpu_userq_gem_va_unmap_validate(struct amdgpu_device *adev, > > > + struct amdgpu_bo_va_mapping > > > *mapping, > > > + uint64_t saddr); > > > #endif > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > index bd12d8ff15a4..ccde1f040cef 100644 > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > @@ -1941,6 +1941,7 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device > > *adev, > > > struct amdgpu_bo_va_mapping *mapping; > > > struct amdgpu_vm *vm = bo_va->base.vm; > > > bool valid = true; > > > + int r; > > > > > > saddr /= AMDGPU_GPU_PAGE_SIZE; > > > > > > @@ -1961,6 +1962,16 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device > > *adev, > > > return -ENOENT; > > > } > > > > > > + /* It's unlikely to happen that the mapping userq hasn't been > > > idled > > > + * during user requests GEM unmap IOCTL except for forcing the > > > unmap > > > + * from user space. > > > + */ > > > + if (unlikely(atomic_read(&bo_va->userq_va_mapped) > 0)) { > > > + r = amdgpu_userq_gem_va_unmap_validate(adev, mapping, > > > saddr); > > > + if (unlikely(r == -EBUSY)) > > > + dev_warn(adev->dev, "Here should be an > > > + improper unmap request from user space\n"); > > > > dev_warn_once(). > > > > This looks good to me, but it would be good to get Christian's input as > > well. > > > > Alex > > > > > + } > > > + > > > list_del(&mapping->list); > > > amdgpu_vm_it_remove(mapping, &vm->va); > > > mapping->bo_va = NULL; > > > -- > > > 2.34.1 > > >