[AMD Official Use Only - AMD Internal Distribution Only] Ping @Lazar, Lijo<mailto:[email protected]>, @Koenig, Christian<mailto:[email protected]>…
Kindly pls review the updated patch in advance and we can discuss your suggestions in tomorrow's meeting. Thanks for your great support. Rgds/Owen From: Deng, Emily <[email protected]> Sent: Monday, May 26, 2025 9:56 AM To: Zhang, GuoQing (Sam) <[email protected]>; Koenig, Christian <[email protected]>; Lazar, Lijo <[email protected]>; Deucher, Alexander <[email protected]> Cc: Zhao, Victor <[email protected]>; Chang, HaiJun <[email protected]>; Zhang, GuoQing (Sam) <[email protected]>; Zhang, Owen(SRDC) <[email protected]>; Ma, Qing (Mark) <[email protected]>; [email protected] Subject: RE: [PATCH v8 0/4] enable xgmi node migration support for hibernate on SRIOV [AMD Official Use Only - AMD Internal Distribution Only] @Koenig, Christian<mailto:[email protected]> and @Lazar, Lijo<mailto:[email protected]> Could you help review these changes again? Best whishes Emily Deng >-----Original Message----- >From: Samuel Zhang <[email protected]<mailto:[email protected]>> >Sent: Thursday, May 22, 2025 6:41 PM >To: [email protected]<mailto:[email protected]> >Cc: Zhao, Victor <[email protected]<mailto:[email protected]>>; Chang, >HaiJun ><[email protected]<mailto:[email protected]>>; Zhang, GuoQing (Sam) ><[email protected]<mailto:[email protected]>>; >Koenig, Christian <[email protected]<mailto:[email protected]>>; >Deucher, Alexander ><[email protected]<mailto:[email protected]>>; Zhang, >Owen(SRDC) <[email protected]<mailto:[email protected]>>; >Ma, Qing (Mark) <[email protected]<mailto:[email protected]>>; Lazar, Lijo ><[email protected]<mailto:[email protected]>>; Deng, >Emily <[email protected]<mailto:[email protected]>> >Subject: [PATCH v8 0/4] enable xgmi node migration support for hibernate on >SRIOV > >On SRIOV and VM environment, customer may need to switch to new vGPU indexes >after hibernate and then resume the VM. For GPUs with XGMI, `vram_start` will >change in this case, the FB aperture gpu address of VRAM BOs will also change. >These gpu addresses need to be updated when resume. But these addresses are all >over the KMD codebase, updating each of them is error-prone and not acceptable. > >The solution is to use pdb0 page table to cover both vram and gart memory and >use >pdb0 virtual gpu address instead. When gpu indexes change, the virtual gpu >address >won't change. > >For psp and smu, pdb0's gpu address does not work, so the original FB aperture >gpu >address is used instead. They need to be updated when resume with changed >vGPUs. > >v2: >- remove physical_node_id_changed >- set vram_start to 0 to switch cached gpu addr to gart aperture >- cleanup pdb0 patch >v3: >- remove gmc_v9_0_init_sw_mem_ranges() call >- remove vram_offset memeber >- add 4 refactoring patch to remove cached gpu addr >- cleanup pdb0 patch >v4: >- remove gmc_v9_0_mc_init() call and `refresh` update. >- do not set `fb_start` in mmhub_v1_8_get_fb_location() when pdb0 enabled. >v5: >- add amdgpu_virt_xgmi_migrate_enabled() check >- move vram_base_offset update to pdb0 patch >- remove 4 refactoring patches to remove cached gpu addr >- add patch to fix IH not working issue when resume with new VF >v6: per Lijo feedback >- rename amdgpu_device_update_xgmi_info() to amdgpu_virt_resume() >- merge xgmi node and vram_base_offset update, IH fix into amdgpu_virt_resume() >- remove 2 unnecessary gpu addr update changes >v7: per Christian feedback >- remove pdb0_enabled and add gmc_v9_0_is_pdb0_enabled() >- remove amdgpu_gmc_vram_location() call in amdgpu_gmc_sysvm_location() >- remove check in mmhub_v1_8_get_fb_location() and update fb_start/fb_end on >resume >v8: >- use cached fb_start in amdgpu_bo_fb_aper_addr() >- remove fb_start/fb_end update in amdgpu_virt_resume() and >amdgpu_gmc_sysvm_location() >- use vram_start to set regVM_CONTEXT0_PAGE_TABLE_START_ADDR_* >- move check to the callsite of amdgpu_virt_resume() >- add gmc.xgmi.node_segment_size check in amdgpu_virt_xgmi_migrate_enabled() >- rename gmc_v9_0_is_pdb0_enabled() to amdgpu_gmc_is_pdb0_enabled() > >Samuel Zhang (4): > drm/amdgpu: update xgmi info and vram_base_offset on resume > drm/amdgpu: update GPU addresses for SMU and PSP > drm/amdgpu: enable pdb0 for hibernation on SRIOV > drm/amdgpu: fix fence fallback timer expired error > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 38 ++++++++++++++++++++++ > drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 28 ++++++++++++---- > drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 20 ++++++++++++ >drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 23 +++++++++++++ > drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 3 ++ > drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 7 ++++ > drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c | 8 +++-- > drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 13 +++++--- > drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c | 6 ++-- > drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 18 ++++++++++ > 13 files changed, 151 insertions(+), 17 deletions(-) > >-- >2.43.5
