On 01.07.25 10:18, Zhang, GuoQing (Sam) wrote: > [AMD Official Use Only - AMD Internal Distribution Only] > > > Hi Christian, > > > > Thank you for the feedback. > > > > For “return ret < 0 ? ret : 0;”, it is equivalent to “return ret;” since ret > is always <= 0 after the loop.
No it isn't. ttm_global_swapout() returns the number of pages swapped out and only a negative error code if something went wrong. And it's probably not a good idea to return that from the new function. Regards, Christian. > > > > For all other comments, I will revise the patch accordingly in v2. > > > > Regards > > Sam > > > > > > *From: *Koenig, Christian <[email protected]> > *Date: *Monday, June 30, 2025 at 19:54 > *To: *Zhang, GuoQing (Sam) <[email protected]>, [email protected] > <[email protected]>, [email protected] <[email protected]>, > [email protected] <[email protected]>, Deucher, Alexander > <[email protected]>, Limonciello, Mario <[email protected]>, > Lazar, Lijo <[email protected]> > *Cc: *Zhao, Victor <[email protected]>, Chang, HaiJun > <[email protected]>, Ma, Qing (Mark) <[email protected]>, > [email protected] <[email protected]>, > [email protected] <[email protected]>, > [email protected] <[email protected]>, > [email protected] <[email protected]> > *Subject: *Re: [PATCH 1/3] drm/amdgpu: move GTT to SHM after eviction for > hibernation > > On 30.06.25 12:41, Samuel Zhang wrote: >> When hibernate with data center dGPUs, huge number of VRAM BOs evicted >> to GTT and takes too much system memory. This will cause hibernation >> fail due to insufficient memory for creating the hibernation image. >> >> Move GTT BOs to shmem in KMD, then shmem to swap disk in kernel >> hibernation code to make room for hibernation image. > > This should probably be two patches, one for TTM and then an amdgpu patch to > forward the event. > >> >> Signed-off-by: Samuel Zhang <[email protected]> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 13 ++++++++++++- >> drivers/gpu/drm/ttm/ttm_resource.c | 18 ++++++++++++++++++ >> include/drm/ttm/ttm_resource.h | 1 + >> 3 files changed, 31 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >> index 4d57269c9ca8..5aede907a591 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >> @@ -2889,6 +2889,7 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo, >> int amdgpu_ttm_evict_resources(struct amdgpu_device *adev, int mem_type) >> { >> struct ttm_resource_manager *man; >> + int r; >> >> switch (mem_type) { >> case TTM_PL_VRAM: >> @@ -2903,7 +2904,17 @@ int amdgpu_ttm_evict_resources(struct amdgpu_device >> *adev, int mem_type) >> return -EINVAL; >> } >> >> - return ttm_resource_manager_evict_all(&adev->mman.bdev, man); >> + r = ttm_resource_manager_evict_all(&adev->mman.bdev, man); >> + if (r) { >> + DRM_ERROR("Failed to evict memory type %d\n", mem_type); >> + return r; >> + } >> + if (adev->in_s4 && mem_type == TTM_PL_VRAM) { >> + r = ttm_resource_manager_swapout(); >> + if (r) >> + DRM_ERROR("Failed to swap out, %d\n", r); >> + } >> + return r; >> } >> >> #if defined(CONFIG_DEBUG_FS) >> diff --git a/drivers/gpu/drm/ttm/ttm_resource.c >> b/drivers/gpu/drm/ttm/ttm_resource.c >> index fd41b56e2c66..07b1f5a5afc2 100644 >> --- a/drivers/gpu/drm/ttm/ttm_resource.c >> +++ b/drivers/gpu/drm/ttm/ttm_resource.c >> @@ -534,6 +534,24 @@ void ttm_resource_manager_init(struct >> ttm_resource_manager *man, >> } >> EXPORT_SYMBOL(ttm_resource_manager_init); >> >> +int ttm_resource_manager_swapout(void) > > This needs documentation, better placement and a better name. > > First of all put it into ttm_device.c instead of the resource manager. > > Then call it something like ttm_device_prepare_hibernation or similar. > > >> +{ >> + struct ttm_operation_ctx ctx = { >> + .interruptible = false, >> + .no_wait_gpu = false, >> + .force_alloc = true >> + }; >> + int ret; >> + >> + while (true) { > > Make that: > > do { > ret = ... > } while (ret > 0); > >> + ret = ttm_global_swapout(&ctx, GFP_KERNEL); >> + if (ret <= 0) >> + break; >> + } >> + return ret; > > It's rather pointless to return the number of swapped out pages. > > Make that "return ret < 0 ? ret : 0; > > Regards, > Christian. > >> +} >> +EXPORT_SYMBOL(ttm_resource_manager_swapout); >> + >> /* >> * ttm_resource_manager_evict_all >> * >> diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h >> index b873be9597e2..46181758068e 100644 >> --- a/include/drm/ttm/ttm_resource.h >> +++ b/include/drm/ttm/ttm_resource.h >> @@ -463,6 +463,7 @@ void ttm_resource_manager_init(struct >> ttm_resource_manager *man, >> >> int ttm_resource_manager_evict_all(struct ttm_device *bdev, >> struct ttm_resource_manager *man); >> +int ttm_resource_manager_swapout(void); >> >> uint64_t ttm_resource_manager_usage(struct ttm_resource_manager *man); >> void ttm_resource_manager_debug(struct ttm_resource_manager *man, >
