FWIW, it looks like all versions of VCN need the same fix.  There have
been reports of suspend failing when VCN is in use on other newer APUs
as well.

Alex

On Tue, Dec 14, 2021 at 12:59 AM Quan, Evan <[email protected]> wrote:
>
> [AMD Official Use Only]
>
>
>
>
>
>
>
> From: Zhu, James <[email protected]>
> Sent: Monday, December 13, 2021 9:39 PM
> To: Gong, Curry <[email protected]>; Zhu, James <[email protected]>; 
> [email protected]
> Cc: Liu, Leo <[email protected]>; Quan, Evan <[email protected]>; Deucher, 
> Alexander <[email protected]>
> Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, 
> powergating is explicitly enabled
>
>
>
> Hi Curry, Evan,
>
> It seems vcn1.0 power gate sequence are special.
>
> if the original solution is thread safe, then the following solution will be 
> thread safe also.
>
> static int vcn_v1_0_hw_fini(void *handle)
> {
>     struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>
>     cancel_delayed_work_sync(&adev->vcn.idle_work);
>
>     if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) ||
>         (adev->vcn.cur_state != AMD_PG_STATE_GATE &&
>          RREG32_SOC15(VCN, 0, mmUVD_STATUS))) {
> +        if (adev->pm.dpm_enabled)
> +            amdgpu_dpm_enable_uvd(adev, false);
> +        else
> +            vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
>
> [Quan, Evan] Considering adev->pm.dpm_enabled is always true, 
> “vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE); “ will become dead 
> code.
>
> Also, the vcn_v1_0_hw_fini is also used in other scenarios(except the 
> suspend/resume discussed here). So, it seems quite risky compared with 
> Curry’s original patch.
>
> Before we can come up with a better idea, I would suggest to land Curry’s 
> original patch as a quick fix.
>
>
>
> BR
>
> Evan
>
>
>     }
>
> Best Regards!
>
> James
>
> On 2021-12-13 3:55 a.m., Gong, Curry wrote:
>
> [AMD Official Use Only]
>
>
>
> Hi James:
>
>
>
> With the following patch, an error will be reported when the driver is loaded
>
> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>
> @@ -1202,6 +1204,9 @@ static int vcn_v1_0_stop(struct amdgpu_device *adev)
>
>         else
>
>                 r = vcn_v1_0_stop_spg_mode(adev);
>
>
>
> c
>
>         return r;
>
> }
>
>
>
>
>
> $ dmesg
>
> [  363.181081] INFO: task kworker/3:2:223 blocked for more than 120 seconds.
>
> [  363.181150]       Tainted: G           OE     5.11.0-41-generic 
> #45~20.04.1-Ubuntu
>
> [  363.181208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
>
> [  363.181266] task:kworker/3:2     state:D stack:    0 pid:  223 ppid:     2 
> flags:0x00004000
>
> [  363.181276] Workqueue: events vcn_v1_0_idle_work_handler [amdgpu]
>
> [  363.181612] Call Trace:
>
> [  363.181618]  __schedule+0x44c/0x8a0
>
> [  363.181627]  schedule+0x4f/0xc0
>
> [  363.181631]  schedule_preempt_disabled+0xe/0x10
>
> [  363.181636]  __mutex_lock.isra.0+0x183/0x4d0
>
> [  363.181643]  __mutex_lock_slowpath+0x13/0x20
>
> [  363.181648]  mutex_lock+0x32/0x40
>
> [  363.181652]  amdgpu_dpm_set_powergating_by_smu+0x9c/0x180 [amdgpu]
>
> [  363.182055]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
>
> [  363.182454]  vcn_v1_0_set_powergating_state+0x2e7e/0x3cf0 [amdgpu]
>
> [  363.182776]  amdgpu_device_ip_set_powergating_state+0x6c/0xc0 [amdgpu]
>
> [  363.183028]  smu10_powergate_vcn+0x2a/0x80 [amdgpu]
>
> [  363.183361]  pp_set_powergating_by_smu+0xc5/0x2b0 [amdgpu]
>
> [  363.183699]  amdgpu_dpm_set_powergating_by_smu+0xb6/0x180 [amdgpu]
>
> [  363.184040]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
>
> [  363.184391]  vcn_v1_0_idle_work_handler+0xe1/0x130 [amdgpu]
>
> [  363.184667]  process_one_work+0x220/0x3c0
>
> [  363.184674]  worker_thread+0x4d/0x3f0
>
> [  363.184677]  ? process_one_work+0x3c0/0x3c0
>
> [  363.184680]  kthread+0x12b/0x150
>
> [  363.184685]  ? set_kthread_struct+0x40/0x40
>
> [  363.184690]  ret_from_fork+0x22/0x30
>
> [  363.184699] INFO: task kworker/2:2:233 blocked for more than 120 seconds.
>
> [  363.184739]       Tainted: G           OE     5.11.0-41-generic 
> #45~20.04.1-Ubuntu
>
> [  363.184782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
>
> [  363.184825] task:kworker/2:2     state:D stack:    0 pid:  233 ppid:     2 
> flags:0x00004000
>
> [  363.184831] Workqueue: events amdgpu_device_delayed_init_work_handler 
> [amdgpu]
>
> [  363.185085] Call Trace:
>
> [  363.185087]  __schedule+0x44c/0x8a0
>
> [  363.185092]  schedule+0x4f/0xc0
>
> [  363.185095]  schedule_timeout+0x202/0x290
>
> [  363.185099]  ? sched_clock_cpu+0x11/0xb0
>
> [  363.185105]  wait_for_completion+0x94/0x100
>
> [  363.185110]  __flush_work+0x12a/0x1e0
>
> [  363.185113]  ? worker_detach_from_pool+0xc0/0xc0
>
> [  363.185119]  __cancel_work_timer+0x10e/0x190
>
> [  363.185123]  cancel_delayed_work_sync+0x13/0x20
>
> [  363.185126]  vcn_v1_0_ring_begin_use+0x20/0x70 [amdgpu]
>
> [  363.185401]  amdgpu_ring_alloc+0x48/0x60 [amdgpu]
>
> [  363.185640]  amdgpu_ib_schedule+0x493/0x600 [amdgpu]
>
> [  363.185884]  amdgpu_job_submit_direct+0x3c/0xd0 [amdgpu]
>
> [  363.186186]  amdgpu_vcn_dec_send_msg+0x105/0x210 [amdgpu]
>
> [  363.186460]  amdgpu_vcn_dec_ring_test_ib+0x69/0x110 [amdgpu]
>
> [  363.186734]  amdgpu_ib_ring_tests+0xf5/0x160 [amdgpu]
>
> [  363.186978]  amdgpu_device_delayed_init_work_handler+0x15/0x30 [amdgpu]
>
> [  363.187206]  process_one_work+0x220/0x3c0
>
> [  363.187210]  worker_thread+0x4d/0x3f0
>
> [  363.187214]  ? process_one_work+0x3c0/0x3c0
>
> [  363.187217]  kthread+0x12b/0x150
>
> [  363.187221]  ? set_kthread_struct+0x40/0x40
>
> [  363.187226]  ret_from_fork+0x22/0x30
>
>
>
>
>
> BR
>
> Curry Gong
>
> From: Zhu, James <[email protected]>
> Sent: Saturday, December 11, 2021 5:07 AM
> To: Gong, Curry <[email protected]>; [email protected]
> Cc: Liu, Leo <[email protected]>; Zhu, James <[email protected]>; Quan, Evan 
> <[email protected]>; Deucher, Alexander <[email protected]>
> Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, 
> powergating is explicitly enabled
>
>
>
> On 2021-12-10 6:41 a.m., chen gong wrote:
>
> Play a video on the raven (or PCO, raven2) platform, and then do the S3
>
> test. When resume, the following error will be reported:
>
>
>
> amdgpu 0000:02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring
>
> vcn_dec test failed (-110)
>
> [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block
>
> <vcn_v1_0> failed -110
>
> amdgpu 0000:02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
>
> PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
>
>
>
> [why]
>
> When playing the video: The power state flag of the vcn block is set to
>
> POWER_STATE_ON.
>
>
>
> When doing suspend: There is no change to the power state flag of the
>
> vcn block, it is still POWER_STATE_ON.
>
>
>
> When doing resume: Need to open the power gate of the vcn block and set
>
> the power state flag of the VCN block to POWER_STATE_ON.
>
> But at this time, the power state flag of the vcn block is already
>
> POWER_STATE_ON. The power status flag check in the "8f2cdef drm/amd/pm:
>
> avoid duplicate powergate/ungate setting" patch will return the
>
> amdgpu_dpm_set_powergating_by_smu function directly.
>
> As a result, the gate of the power was not opened, causing the
>
> subsequent ring test to fail.
>
>
>
> [how]
>
> In the suspend function of the vcn block, explicitly change the power
>
> state flag of the vcn block to POWER_STATE_OFF.
>
>
>
> Signed-off-by: chen gong <[email protected]>
>
> ---
>
>  drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++++++
>
>  1 file changed, 7 insertions(+)
>
>
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c 
> b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>
> index d54d720..d73676b 100644
>
> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>
> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>
> @@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle)
>
>  {
>
>   int r;
>
>   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>
> + bool cancel_success;
>
> +
>
> + cancel_success = cancel_delayed_work_sync(&adev->vcn.idle_work);
>
> [JZ] Can you refer to vcn_v3_0_stop , and add amdgpu_dpm_enable_uvd(adev, 
> false); to the end of vcn_v1_0_stop?
>
> See if it also can help.
>
>
>
> + if (cancel_success) {
>
> +        if (adev->pm.dpm_enabled)
>
> +                amdgpu_dpm_enable_uvd(adev, false);
>
> + }
>
>
>
>   r = vcn_v1_0_hw_fini(adev);
>
>   if (r)

Reply via email to