[AMD Official Use Only - AMD Internal Distribution Only] Hi @Lazar, Lijo<mailto:[email protected]>,
Thank you for the review and feedback. I have revised the patch list according to your feedback and sent out the v6 patch list. Please take another look. Thank you! v6 patch list mail titles [PATCH v6 0/4] enable xgmi node migration support for hibernate on SRIOV. [PATCH v6 1/4] drm/amdgpu: update xgmi info and vram_base_offset on resume [PATCH v6 2/4] drm/amdgpu: update GPU addresses for SMU and PSP [PATCH v6 3/4] drm/amdgpu: enable pdb0 for hibernation on SRIOV [PATCH v6 4/4] drm/amdgpu: fix fence fallback timer expired error Regards Sam From: Lazar, Lijo <[email protected]> Date: Friday, May 16, 2025 at 18:22 To: Zhang, GuoQing (Sam) <[email protected]>, [email protected] <[email protected]> Cc: Zhao, Victor <[email protected]>, Chang, HaiJun <[email protected]>, Koenig, Christian <[email protected]>, Deucher, Alexander <[email protected]>, Zhang, Owen(SRDC) <[email protected]>, Ma, Qing (Mark) <[email protected]> Subject: Re: [PATCH v5 4/4] drm/amdgpu: fix fence fallback timer expired error On 5/12/2025 12:11 PM, Samuel Zhang wrote: > IH is not working after switching a new gpu index for the first time. > > The msix table in virtual machine is faked. The real msix table will be > programmed by QEMU when guest enable/disable msix interrupt. But QEMU > accessing VF msix table (register GFXMSIX_VECT0_ADDR_LO) is blocked > by nBIF protection if the VF isn't in exclusive access at that time. > > call amdgpu_restore_msix on resume to restore msix table. > > Signed-off-by: Samuel Zhang <[email protected]> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 1 + > drivers/gpu/drm/amd/amdgpu/vega20_ih.c | 4 ++++ > 3 files changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > index 0e890f2785b1..f080354efec8 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > @@ -245,7 +245,7 @@ static bool amdgpu_msi_ok(struct amdgpu_device *adev) > return true; > } > > -static void amdgpu_restore_msix(struct amdgpu_device *adev) > +void amdgpu_restore_msix(struct amdgpu_device *adev) > { > u16 ctrl; > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h > index aef5c216b191..f52bd7e6d988 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h > @@ -149,5 +149,6 @@ void amdgpu_irq_gpu_reset_resume_helper(struct > amdgpu_device *adev); > int amdgpu_irq_add_domain(struct amdgpu_device *adev); > void amdgpu_irq_remove_domain(struct amdgpu_device *adev); > unsigned amdgpu_irq_create_mapping(struct amdgpu_device *adev, unsigned > src_id); > +void amdgpu_restore_msix(struct amdgpu_device *adev); > > #endif > diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c > b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c > index faa0dd75dd6d..53c253102449 100644 > --- a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c > +++ b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c > @@ -648,6 +648,10 @@ static int vega20_ih_suspend(struct amdgpu_ip_block > *ip_block) > > static int vega20_ih_resume(struct amdgpu_ip_block *ip_block) > { > + struct amdgpu_device *adev = ip_block->adev; > + > + if (amdgpu_sriov_vf(adev)) > + amdgpu_restore_msix(adev); You may consider consolidating these under amdgpu_device_resume() -> amdgpu_virt_resume_after_migration() amdgpu_virt_resume_after_migration() { virt_update_xgmi_info virt_vram_offset_update restore_msix } Thanks, Lijo > return vega20_ih_hw_init(ip_block); > } >
