[AMD Official Use Only - AMD Internal Distribution Only] I didn't fully understand the question.
For the same instance, begin_thread will set the power state only after cancelling any idle worker for the instance. If idle worker is already under progress, then it needs to complete as that's a cancel_sync (it's the existing logic). Basically, by the time begin_thread sets the PG state, no idle worker for the same vcn instance would be active. If it's about context switch to another vcn instance's begin_thread, I think that won't be a problem. Thanks, Lijo ________________________________ From: Wu, David <[email protected]> Sent: Thursday, August 14, 2025 11:14:26 PM To: Lazar, Lijo <[email protected]>; Sundararaju, Sathishkumar <[email protected]>; Alex Deucher <[email protected]> Cc: Wu, David <[email protected]>; Deucher, Alexander <[email protected]>; [email protected] <[email protected]> Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race condition (v3) amdgpu_vcn_idle_work_handler(): if (!fences && !atomic_read(&vcn_inst->total_submission_cnt)) { ----------- could it be possible a context switch here to amdgpu_vcn_ring_begin_use()? if it could then AMD_PG_STATE_GATE will be set by mistake. David On 2025-08-14 08:54, Lazar, Lijo wrote: > [Public] > > The request profile can be moved outside the pg_lock in begin_use as in the > attached patch. It needs set power state -> set profile order. > > This is the premise - > > Let's say there are two threads, begin_use thread and idle_work threads. > begin_use and idle_work will need the workprofile mutex to request a profile. > > Case 1) Idle thread gets the lock first. > a) Idle thread sees vinst power state as PG_UNGATE, no harm done. It > exits without requesting power profile change. begin_use thread gets the lock > next, it sees profile as active and continues. > b) Idle thread sees vinst power state as PG_GATE, it will make > workprofile_active to false and exit. Now when begin_use thread gets the > mutex next, it's guaranteed to see the workprofile_active as false, hence it > will request the profile. > > Case 2) begin_use thread gets the lock first. > a) Workload profile is active, hence it doesn't do anything and > exits. The change made by begin_use thread to vinst power state (state = on) > will now be visible to idle thread which gets the lock next. It will do > nothing and exit. > b) Workload profile is inactive, hence it requests a profile change. > Again, the change made by begin_use thread to vinst power state will now be > visible to idle thread which gets the lock next. It will do nothing and exit. > > Thanks, > Lijo > > -----Original Message----- > From: Sundararaju, Sathishkumar <[email protected]> > Sent: Thursday, August 14, 2025 6:18 PM > To: Lazar, Lijo <[email protected]>; Alex Deucher <[email protected]> > Cc: Wu, David <[email protected]>; Deucher, Alexander > <[email protected]>; [email protected] > Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race condition (v3) > > > On 8/14/2025 5:33 PM, Lazar, Lijo wrote: >> [Public] >> >> There is no need for nested lock. It only needs to follow the order >> set instance power_state >> set profile (this takes a global lock and hence instance power >> state will be visible to whichever thread that gets the work profile lock). >> >> You are seeing nested lock just because I added the code just after power >> state setting. > Pasting your code from the file for ref : > > @@ -464,32 +509,14 @@ void amdgpu_vcn_ring_begin_use(struct amdgpu_ring > *ring) > > -pg_lock: > > mutex_lock(&vcn_inst->vcn_pg_lock); > vcn_inst->set_pg_state(vcn_inst, AMD_PG_STATE_UNGATE); > > + amdgpu_vcn_get_profile(adev); > > vcn_pg_lock isn't released here yet right ? And in-case you intend to only > order the locks, then still there is an un-necessary OFF followed by ON, but > yes that is acceptable, > > May be you want to move that vcn_pg_lock after amdgpu_vcn_get_profile to > protect concurrent dpg_state access in begin_use. > > The concern is, this patch access power_state that is protected by some other > mutex lock hoping it reflects right values also when holding > powerprofile_lock. > > Or > > Have shared a patch with global workload_profile_mutex that simplifies this > handling, and renamed pg_lock -> dpg_lock and used > > that only for dpg_state changes per instance. > > Regards, > > Sathish > >> Thanks, >> Lijo >> >> -----Original Message----- >> From: Sundararaju, Sathishkumar <[email protected]> >> Sent: Thursday, August 14, 2025 5:23 PM >> To: Lazar, Lijo <[email protected]>; Alex Deucher >> <[email protected]> >> Cc: Wu, David <[email protected]>; Deucher, Alexander >> <[email protected]>; [email protected] >> Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race condition >> (v3) >> >> >> On 8/14/2025 3:16 PM, Lazar, Lijo wrote: >>> [Public] >>> >>> I see your point now. Attached should work, I guess. Is the concern more >>> about having to take the lock for every begin? >> This is closer, but the thing is, IMO we shouldn't have to use 2 locks and >> go into nested locking, we can do with just one global lock. >> >> Power_state of each instance, and global workload_profile_active are >> inter-related, they need to be guarded together, >> >> nested could work , but why nested if single lock is enough ? nested >> complicates it. >> >> Regards, >> >> Sathish >> >>> Thanks, >>> Lijo >>> >>> -----Original Message----- >>> From: amd-gfx <[email protected]> On Behalf Of >>> Lazar, Lijo >>> Sent: Thursday, August 14, 2025 2:55 PM >>> To: Sundararaju, Sathishkumar <[email protected]>; >>> Alex Deucher <[email protected]> >>> Cc: Wu, David <[email protected]>; Deucher, Alexander >>> <[email protected]>; [email protected] >>> Subject: RE: [PATCH] drm/amdgpu/vcn: fix video profile race condition >>> (v3) >>> >>> [Public] >>> >>> That is not required I think. The power profile is set by an instance >>> *after* setting itself to power on. Also, it's switched back after changing >>> its power state to off. If idle worker is run by another instance, it >>> won't be seeing the inst0 as power gated and won't change power profile. >>> >>> Thanks, >>> Lijo >>> -----Original Message----- >>> From: Sundararaju, Sathishkumar <[email protected]> >>> Sent: Thursday, August 14, 2025 2:41 PM >>> To: Lazar, Lijo <[email protected]>; Alex Deucher >>> <[email protected]> >>> Cc: Wu, David <[email protected]>; Deucher, Alexander >>> <[email protected]>; [email protected] >>> Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race condition >>> (v3) >>> >>> Hi Lijo, >>> >>> On 8/14/2025 2:11 PM, Lazar, Lijo wrote: >>>> [Public] >>>> >>>> We already have a per instance power state that can be tracked. What about >>>> something like attached? >>> This also has concurrent access of the power state , >>> vcn.inst[i].cur_state is not protected by workload_profile_mutex >>> >>> every where, it can still change while you are holding >>> workload_profile_mutex and checking it. >>> >>> Regards, >>> >>> Sathish >>> >>>> Thanks, >>>> Lijo >>>> -----Original Message----- >>>> From: amd-gfx <[email protected]> On Behalf Of >>>> Sundararaju, Sathishkumar >>>> Sent: Thursday, August 14, 2025 4:43 AM >>>> To: Alex Deucher <[email protected]> >>>> Cc: Wu, David <[email protected]>; Deucher, Alexander >>>> <[email protected]>; [email protected] >>>> Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race >>>> condition >>>> (v3) >>>> >>>> >>>> On 8/14/2025 3:38 AM, Alex Deucher wrote: >>>>> On Wed, Aug 13, 2025 at 5:1 PM Sundararaju, Sathishkumar >>>>> <[email protected]> wrote: >>>>>> On 8/14/2025 2:33 AM, Alex Deucher wrote: >>>>>>> On Wed, Aug 13, 2025 at 4:58 PM Sundararaju, Sathishkumar >>>>>>> <[email protected]> wrote: >>>>>>>> On 8/14/2025 1:35 AM, Alex Deucher wrote: >>>>>>>>> On Wed, Aug 13, 2025 at 2:23 PM Sundararaju, Sathishkumar >>>>>>>>> <[email protected]> wrote: >>>>>>>>>> Hi Alex, Hi David, >>>>>>>>>> >>>>>>>>>> I see David's concern but his suggestion yet wont solve the >>>>>>>>>> problem, neither the current form , reason :- >>>>>>>>>> >>>>>>>>>> The emitted fence count and total submission count are fast >>>>>>>>>> transients which frequently become 0 in between video decodes >>>>>>>>>> (between jobs) even with the atomics and locks there can be a >>>>>>>>>> switch of video power profile, in the current form of patch >>>>>>>>>> that window is minimized, but still can happen if stress >>>>>>>>>> tested. But power state of any instance becoming zero >>>>>>>>> Can you explain how this can happen? I'm not seeing it. >>>>>>>> Consider this situation, inst0 and inst1 actively decoding, >>>>>>>> inst0 decode completes, delayed idle work starts. >>>>>>>> inst0 idle handler can read 0 total fences and 0 total >>>>>>>> submission count, even if inst1 is actively decoding, that's between >>>>>>>> the jobs, >>>>>>>> - as begin_use increaments vcn.total_submission_cnt and >>>>>>>> end_use decreaments vcn.total_submission_cnt that can be 0. >>>>>>>> - if outstanding fences are cleared and no new emitted >>>>>>>> fence, between jobs , can be 0. >>>>>>>> - both of the above conditions do not mean video decode >>>>>>>> is complete on inst1, it is actively decoding. >>>>>>> How can there be active decoding without an outstanding fence? >>>>>>> In that case, total_fences (fences from both instances) would be non-0. >>>>>> I mean on inst1 the job scheduled is already complete, so 0 >>>>>> outstanding fences, newer job is yet to be scheduled >>>>>> >>>>>> and commited to ring (inst1) , this doesn't mean decode has >>>>>> stopped on >>>>>> inst1 right (I am saying if timing of inst0 idle work coincides >>>>>> with this), >>>>>> >>>>>> Or am I wrong in assuming this ? Can't this ever happen ? Please >>>>>> correct my understanding here. >>>>> The flow looks like: >>>>> >>>>> begin_use(inst) >>>>> emit_fence(inst) >>>>> end_use(inst) >>>>> >>>>> ...later >>>>> fence signals >>>>> ...later >>>>> work handler >>>>> >>>>> In begin_use we increment the global and per instance submission. >>>>> This protects the power gating and profile until end_use. In end >>>>> use we decrement the submissions because we don't need to protect >>>>> anything any more as we have the fence that was submitted via the >>>>> ring. That fence won't signal until the job is complete. >>>> Is a next begin_use always guaranteed to be run before current job fence >>>> signals ? >>>> >>>> if not then both total submission and total fence are zero , example >>>> delayed job/packet submissions >>>> >>>> from user space, or next job schedule happens after current job fence >>>> signals. >>>> >>>> if this is never possible then (v3) is perfect. >>>> >>>> Regards, >>>> >>>> Sathish >>>> >>>>> For power gating, we >>>>> only care about the submission count and fences for that instance, >>>>> for the profile, we care about submission count and fences all instances. >>>>> Once the fences have signalled, the outstanding fences will be 0 >>>>> and there won't be any active work. >>>>> >>>>> Alex >>>>> >>>>>> Regards, >>>>>> >>>>>> Sathish >>>>>> >>>>>>> Alex >>>>>>> >>>>>>>> Whereas if instances are powered off we are sure idle time is >>>>>>>> past and it is powered off, no possible way of active video >>>>>>>> decode, when all instances are off we can safely assume no >>>>>>>> active decode and global lock protects it against new begin_use on any >>>>>>>> instance. >>>>>>>> But the only distant concern is global common locks w.r.t perf, >>>>>>>> but we are already having a global workprofile mutex , so there >>>>>>>> shouldn't be any drop in perf, with just one single global lock >>>>>>>> for all instances. >>>>>>>> >>>>>>>> Just sending out a patch with this fix, will leave it to you to >>>>>>>> decide the right method. If you think outstanding total fences >>>>>>>> can never be 0 during decode, then your previous version (v3) >>>>>>>> itself is good, there is no real benefit of splitting the handlers as >>>>>>>> such. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Sathish >>>>>>>>> If it is possible, maybe it would be easier to just split the >>>>>>>>> profile and powergating into separate handlers. The profile >>>>>>>>> one would be global and the powergating one would be per instance. >>>>>>>>> See the attached patches. >>>>>>>>> >>>>>>>>> Alex >>>>>>>>> >>>>>>>>>> can be a sure shot indication of break in a video decode, the >>>>>>>>>> mistake in my patch was using per instance mutex, I should >>>>>>>>>> have used a common global mutex, then that covers the situation >>>>>>>>>> David is trying to bring out. >>>>>>>>>> >>>>>>>>>> Using one global vcn.pg_lock for idle and begin_use and using >>>>>>>>>> flags to track power state could help us totally avoid this >>>>>>>>>> situation. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> >>>>>>>>>> Sathish >>>>>>>>>> >>>>>>>>>> On 8/13/2025 11:46 PM, Wu, David wrote: >>>>>>>>>>> On 8/13/2025 12:51 PM, Alex Deucher wrote: >>>>>>>>>>>> On Wed, Aug 13, 2025 at 12:39 PM Wu, David <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> Hi Alex, >>>>>>>>>>>>> >>>>>>>>>>>>> The addition of total_submission_cnt should work - in that >>>>>>>>>>>>> it is unlikely to have a context switch right after the >>>>>>>>>>>>> begin_use(). >>>>>>>>>>>>> The suggestion of moving it inside the lock (which I prefer >>>>>>>>>>>>> in case someone adds more before the lock and not reviewed >>>>>>>>>>>>> thoroughly) >>>>>>>>>>>>> - up to you to decide. >>>>>>>>>>>>> >>>>>>>>>>>>> Reviewed-by: David (Ming Qiang) Wu <[email protected]> >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> David >>>>>>>>>>>>> On 8/13/2025 9:45 AM, Alex Deucher wrote: >>>>>>>>>>>>>> If there are multiple instances of the VCN running, we may >>>>>>>>>>>>>> end up switching the video profile while another instance >>>>>>>>>>>>>> is active because we only take into account the current >>>>>>>>>>>>>> instance's submissions. Look at all outstanding fences >>>>>>>>>>>>>> for the video profile. >>>>>>>>>>>>>> >>>>>>>>>>>>>> v2: drop early exit in begin_use() >>>>>>>>>>>>>> v3: handle possible race between begin_use() work handler >>>>>>>>>>>>>> >>>>>>>>>>>>>> Fixes: 3b669df92c85 ("drm/amdgpu/vcn: adjust workload >>>>>>>>>>>>>> profile >>>>>>>>>>>>>> handling") >>>>>>>>>>>>>> Reviewed-by: Sathishkumar S >>>>>>>>>>>>>> <[email protected]> (v1) >>>>>>>>>>>>>> Signed-off-by: Alex Deucher <[email protected]> >>>>>>>>>>>>>> --- >>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 40 >>>>>>>>>>>>>> ++++++++++++------------- >>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h | 1 + >>>>>>>>>>>>>> 2 files changed, 21 insertions(+), 20 >>>>>>>>>>>>>> deletions(-) >>>>>>>>>>>>>> >>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c >>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c >>>>>>>>>>>>>> index 9a76e11d1c184..593c1ddf8819b 100644 >>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c >>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c >>>>>>>>>>>>>> @@ -415,19 +415,25 @@ static void >>>>>>>>>>>>>> amdgpu_vcn_idle_work_handler(struct work_struct *work) >>>>>>>>>>>>>> struct amdgpu_vcn_inst *vcn_inst = >>>>>>>>>>>>>> container_of(work, struct >>>>>>>>>>>>>> amdgpu_vcn_inst, idle_work.work); >>>>>>>>>>>>>> struct amdgpu_device *adev = vcn_inst->adev; >>>>>>>>>>>>>> - unsigned int fences = 0, fence[AMDGPU_MAX_VCN_INSTANCES] = >>>>>>>>>>>>>> {0}; >>>>>>>>>>>>>> - unsigned int i = vcn_inst->inst, j; >>>>>>>>>>>>>> + unsigned int total_fences = 0, >>>>>>>>>>>>>> fence[AMDGPU_MAX_VCN_INSTANCES] = {0}; >>>>>>>>>>>>>> + unsigned int i, j; >>>>>>>>>>>>>> int r = 0; >>>>>>>>>>>>>> >>>>>>>>>>>>>> - if (adev->vcn.harvest_config & (1 << i)) >>>>>>>>>>>>>> + if (adev->vcn.harvest_config & (1 << >>>>>>>>>>>>>> + vcn_inst->inst)) >>>>>>>>>>>>>> return; >>>>>>>>>>>>>> >>>>>>>>>>>>>> - for (j = 0; j < adev->vcn.inst[i].num_enc_rings; ++j) >>>>>>>>>>>>>> - fence[i] += >>>>>>>>>>>>>> amdgpu_fence_count_emitted(&vcn_inst->ring_enc[j]); >>>>>>>>>>>>>> + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { >>>>>>>>>>>>>> + struct amdgpu_vcn_inst *v = >>>>>>>>>>>>>> + &adev->vcn.inst[i]; >>>>>>>>>>>>>> + >>>>>>>>>>>>>> + for (j = 0; j < v->num_enc_rings; ++j) >>>>>>>>>>>>>> + fence[i] += >>>>>>>>>>>>>> amdgpu_fence_count_emitted(&v->ring_enc[j]); >>>>>>>>>>>>>> + fence[i] += >>>>>>>>>>>>>> amdgpu_fence_count_emitted(&v->ring_dec); >>>>>>>>>>>>>> + total_fences += fence[i]; >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> >>>>>>>>>>>>>> /* Only set DPG pause for VCN3 or below, VCN4 >>>>>>>>>>>>>> and above will be handled by FW */ >>>>>>>>>>>>>> if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG && >>>>>>>>>>>>>> - !adev->vcn.inst[i].using_unified_queue) { >>>>>>>>>>>>>> + !vcn_inst->using_unified_queue) { >>>>>>>>>>>>>> struct dpg_pause_state new_state; >>>>>>>>>>>>>> >>>>>>>>>>>>>> if (fence[i] || @@ -436,18 +442,18 @@ >>>>>>>>>>>>>> static void amdgpu_vcn_idle_work_handler(struct >>>>>>>>>>>>>> work_struct >>>>>>>>>>>>>> *work) >>>>>>>>>>>>>> else >>>>>>>>>>>>>> new_state.fw_based = >>>>>>>>>>>>>> VCN_DPG_STATE__UNPAUSE; >>>>>>>>>>>>>> >>>>>>>>>>>>>> - adev->vcn.inst[i].pause_dpg_mode(vcn_inst, >>>>>>>>>>>>>> &new_state); >>>>>>>>>>>>>> + vcn_inst->pause_dpg_mode(vcn_inst, >>>>>>>>>>>>>> + &new_state); >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> - fence[i] += >>>>>>>>>>>>>> amdgpu_fence_count_emitted(&vcn_inst->ring_dec); >>>>>>>>>>>>>> - fences += fence[i]; >>>>>>>>>>>>>> - >>>>>>>>>>>>>> - if (!fences && >>>>>>>>>>>>>> !atomic_read(&vcn_inst->total_submission_cnt)) { >>>>>>>>>>>>>> + if (!fence[vcn_inst->inst] && >>>>>>>>>>>>>> !atomic_read(&vcn_inst->total_submission_cnt)) { >>>>>>>>>>>>>> + /* This is specific to this instance */ >>>>>>>>>>>>>> mutex_lock(&vcn_inst->vcn_pg_lock); >>>>>>>>>>>>>> vcn_inst->set_pg_state(vcn_inst, >>>>>>>>>>>>>> AMD_PG_STATE_GATE); >>>>>>>>>>>>>> mutex_unlock(&vcn_inst->vcn_pg_lock); >>>>>>>>>>>>>> mutex_lock(&adev->vcn.workload_profile_mutex); >>>>>>>>>>>>>> - if (adev->vcn.workload_profile_active) { >>>>>>>>>>>>>> + /* This is global and depends on all VCN instances >>>>>>>>>>>>>> */ >>>>>>>>>>>>>> + if (adev->vcn.workload_profile_active && >>>>>>>>>>>>>> !total_fences && >>>>>>>>>>>>>> + !atomic_read(&adev->vcn.total_submission_cnt)) { >>>>>>>>>>>>>> r = >>>>>>>>>>>>>> amdgpu_dpm_switch_power_profile(adev, >>>>>>>>>>>>>> PP_SMC_POWER_PROFILE_VIDEO, false); >>>>>>>>>>>>>> if (r) @@ -467,16 +473,10 @@ >>>>>>>>>>>>>> void amdgpu_vcn_ring_begin_use(struct amdgpu_ring *ring) >>>>>>>>>>>>>> int r = 0; >>>>>>>>>>>>>> >>>>>>>>>>>>>> atomic_inc(&vcn_inst->total_submission_cnt); >>>>>>>>>>>>>> + atomic_inc(&adev->vcn.total_submission_cnt); >>>>>>>>>>>>> move this addition down inside the mutex lock >>>>>>>>>>>>>> cancel_delayed_work_sync(&vcn_inst->idle_work); >>>>>>>>>>>>>> >>>>>>>>>>>>>> - /* We can safely return early here because we've cancelled >>>>>>>>>>>>>> the >>>>>>>>>>>>>> - * the delayed work so there is no one else to set it to >>>>>>>>>>>>>> false >>>>>>>>>>>>>> - * and we don't care if someone else sets it to true. >>>>>>>>>>>>>> - */ >>>>>>>>>>>>>> - if (adev->vcn.workload_profile_active) >>>>>>>>>>>>>> - goto pg_lock; >>>>>>>>>>>>>> - >>>>>>>>>>>>>> >>>>>>>>>>>>>> mutex_lock(&adev->vcn.workload_profile_mutex); >>>>>>>>>>>>> move to here: >>>>>>>>>>>>> atomic_inc(&adev->vcn.total_submission_cnt); >>>>>>>>>>>>> I think this should work for multiple instances. >>>>>>>>>>>> Why does this need to be protected by the mutex? >>>>>>>>>>> hmm.. OK - no need and it is actually better before the mutex. >>>>>>>>>>> David >>>>>>>>>>>> Alex >>>>>>>>>>>> >>>>>>>>>>>>> David >>>>>>>>>>>>>> if (!adev->vcn.workload_profile_active) { >>>>>>>>>>>>>> r = >>>>>>>>>>>>>> amdgpu_dpm_switch_power_profile(adev, >>>>>>>>>>>>>> PP_SMC_POWER_PROFILE_VIDEO, @@ -487,7 +487,6 @@ void >>>>>>>>>>>>>> amdgpu_vcn_ring_begin_use(struct amdgpu_ring *ring) >>>>>>>>>>>>>> } >>>>>>>>>>>>>> mutex_unlock(&adev->vcn.workload_profile_mutex); >>>>>>>>>>>>>> >>>>>>>>>>>>>> -pg_lock: >>>>>>>>>>>>>> mutex_lock(&vcn_inst->vcn_pg_lock); >>>>>>>>>>>>>> vcn_inst->set_pg_state(vcn_inst, >>>>>>>>>>>>>> AMD_PG_STATE_UNGATE); >>>>>>>>>>>>>> >>>>>>>>>>>>>> @@ -528,6 +527,7 @@ void amdgpu_vcn_ring_end_use(struct >>>>>>>>>>>>>> amdgpu_ring >>>>>>>>>>>>>> *ring) >>>>>>>>>>>>>> atomic_dec(&ring->adev->vcn.inst[ring->me].dpg_enc_submiss >>>>>>>>>>>>>> i >>>>>>>>>>>>>> o >>>>>>>>>>>>>> n >>>>>>>>>>>>>> _cnt); >>>>>>>>>>>>>> >>>>>>>>>>>>>> atomic_dec(&ring->adev->vcn.inst[ring->me].total_submissio >>>>>>>>>>>>>> n >>>>>>>>>>>>>> _ >>>>>>>>>>>>>> c >>>>>>>>>>>>>> nt); >>>>>>>>>>>>>> + atomic_dec(&ring->adev->vcn.total_submission_cnt); >>>>>>>>>>>>>> >>>>>>>>>>>>>> schedule_delayed_work(&ring->adev->vcn.inst[ring->me].idle_work, >>>>>>>>>>>>>> VCN_IDLE_TIMEOUT); diff >>>>>>>>>>>>>> --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h >>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h >>>>>>>>>>>>>> index b3fb1d0e43fc9..febc3ce8641ff 100644 >>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h >>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h >>>>>>>>>>>>>> @@ -352,6 +352,7 @@ struct amdgpu_vcn { >>>>>>>>>>>>>> >>>>>>>>>>>>>> uint16_t inst_mask; >>>>>>>>>>>>>> uint8_t num_inst_per_aid; >>>>>>>>>>>>>> + atomic_t total_submission_cnt; >>>>>>>>>>>>>> >>>>>>>>>>>>>> /* IP reg dump */ >>>>>>>>>>>>>> uint32_t *ip_dump;
