[AMD Official Use Only - AMD Internal Distribution Only]

I didn't fully understand the question.

For the same instance, begin_thread will set the power state only after 
cancelling any idle worker for the instance. If idle worker is already under 
progress, then it needs to complete as that's a cancel_sync (it's the existing 
logic).

Basically, by the time begin_thread sets the PG state, no idle worker for the 
same vcn instance would be active. If it's about context switch to another vcn 
instance's begin_thread, I think that won't be a problem.

Thanks,
Lijo
________________________________
From: Wu, David <[email protected]>
Sent: Thursday, August 14, 2025 11:14:26 PM
To: Lazar, Lijo <[email protected]>; Sundararaju, Sathishkumar 
<[email protected]>; Alex Deucher <[email protected]>
Cc: Wu, David <[email protected]>; Deucher, Alexander 
<[email protected]>; [email protected] 
<[email protected]>
Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race condition (v3)

amdgpu_vcn_idle_work_handler():
     if (!fences && !atomic_read(&vcn_inst->total_submission_cnt)) {
----------- could it be possible a context switch here to
amdgpu_vcn_ring_begin_use()?
  if it could then AMD_PG_STATE_GATE will be set by mistake.

David

On 2025-08-14 08:54, Lazar, Lijo wrote:
> [Public]
>
> The request profile can be moved outside the pg_lock in begin_use as in the 
> attached patch. It needs  set power state -> set profile order.
>
> This is the premise -
>
> Let's say there are two threads, begin_use thread and idle_work threads. 
> begin_use and idle_work will need the workprofile mutex to request a profile.
>
> Case 1) Idle thread gets the lock first.
>          a) Idle thread sees vinst power state as PG_UNGATE, no harm done. It 
> exits without requesting power profile change. begin_use thread gets the lock 
> next, it sees profile as active and continues.
>          b) Idle thread sees vinst power state as PG_GATE, it will make 
> workprofile_active to false and exit. Now when begin_use thread gets the 
> mutex next, it's guaranteed to see the workprofile_active as false, hence it 
> will request the profile.
>
> Case 2) begin_use thread gets the lock first.
>          a) Workload profile is active, hence it doesn't do anything and 
> exits. The change made by begin_use thread to vinst power state (state = on) 
> will now be visible to idle thread which gets the lock next. It will do 
> nothing and exit.
>          b) Workload profile is inactive, hence it requests a profile change. 
> Again, the change made by begin_use thread to vinst power state will now be 
> visible to idle thread which gets the lock next. It will do nothing and exit.
>
> Thanks,
> Lijo
>
> -----Original Message-----
> From: Sundararaju, Sathishkumar <[email protected]>
> Sent: Thursday, August 14, 2025 6:18 PM
> To: Lazar, Lijo <[email protected]>; Alex Deucher <[email protected]>
> Cc: Wu, David <[email protected]>; Deucher, Alexander 
> <[email protected]>; [email protected]
> Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race condition (v3)
>
>
> On 8/14/2025 5:33 PM, Lazar, Lijo wrote:
>> [Public]
>>
>> There is no need for nested lock. It only needs to follow the order
>>           set instance power_state
>>           set profile (this takes a global lock and hence instance power 
>> state will be visible to whichever thread that gets the work profile lock).
>>
>> You are seeing nested lock just because I added the code just after power 
>> state setting.
> Pasting your code from the file for ref :
>
> @@ -464,32 +509,14 @@ void amdgpu_vcn_ring_begin_use(struct amdgpu_ring
> *ring)
>
> -pg_lock:
>
>        mutex_lock(&vcn_inst->vcn_pg_lock);
>        vcn_inst->set_pg_state(vcn_inst, AMD_PG_STATE_UNGATE);
>
> +   amdgpu_vcn_get_profile(adev);
>
> vcn_pg_lock isn't  released here yet right ? And in-case you intend to only 
> order the locks, then still there is an un-necessary OFF followed by ON, but 
> yes that is acceptable,
>
> May be you want to move that vcn_pg_lock after amdgpu_vcn_get_profile to 
> protect concurrent dpg_state access in begin_use.
>
> The concern is, this patch access power_state that is protected by some other 
> mutex lock hoping it reflects right values also when holding 
> powerprofile_lock.
>
> Or
>
> Have shared a patch with global workload_profile_mutex that simplifies this 
> handling, and renamed pg_lock -> dpg_lock  and used
>
> that only for dpg_state changes per instance.
>
> Regards,
>
> Sathish
>
>> Thanks,
>> Lijo
>>
>> -----Original Message-----
>> From: Sundararaju, Sathishkumar <[email protected]>
>> Sent: Thursday, August 14, 2025 5:23 PM
>> To: Lazar, Lijo <[email protected]>; Alex Deucher
>> <[email protected]>
>> Cc: Wu, David <[email protected]>; Deucher, Alexander
>> <[email protected]>; [email protected]
>> Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race condition
>> (v3)
>>
>>
>> On 8/14/2025 3:16 PM, Lazar, Lijo wrote:
>>> [Public]
>>>
>>> I see your point now. Attached should work, I guess. Is the concern more 
>>> about having to take the lock for every begin?
>> This is closer,  but the thing is, IMO we shouldn't have to use 2 locks and 
>> go into nested locking, we can do with just one global lock.
>>
>> Power_state of each instance, and global workload_profile_active are
>> inter-related, they need to be guarded together,
>>
>> nested could work , but why nested if single lock is enough ? nested 
>> complicates it.
>>
>> Regards,
>>
>> Sathish
>>
>>> Thanks,
>>> Lijo
>>>
>>> -----Original Message-----
>>> From: amd-gfx <[email protected]> On Behalf Of
>>> Lazar, Lijo
>>> Sent: Thursday, August 14, 2025 2:55 PM
>>> To: Sundararaju, Sathishkumar <[email protected]>;
>>> Alex Deucher <[email protected]>
>>> Cc: Wu, David <[email protected]>; Deucher, Alexander
>>> <[email protected]>; [email protected]
>>> Subject: RE: [PATCH] drm/amdgpu/vcn: fix video profile race condition
>>> (v3)
>>>
>>> [Public]
>>>
>>> That is not required I think. The power profile is set by an instance 
>>> *after* setting itself to power on. Also, it's switched back after changing 
>>> its power state to off.  If idle worker is run by another instance, it 
>>> won't be seeing the inst0 as power gated and won't change power profile.
>>>
>>> Thanks,
>>> Lijo
>>> -----Original Message-----
>>> From: Sundararaju, Sathishkumar <[email protected]>
>>> Sent: Thursday, August 14, 2025 2:41 PM
>>> To: Lazar, Lijo <[email protected]>; Alex Deucher
>>> <[email protected]>
>>> Cc: Wu, David <[email protected]>; Deucher, Alexander
>>> <[email protected]>; [email protected]
>>> Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race condition
>>> (v3)
>>>
>>> Hi Lijo,
>>>
>>> On 8/14/2025 2:11 PM, Lazar, Lijo wrote:
>>>> [Public]
>>>>
>>>> We already have a per instance power state that can be tracked. What about 
>>>> something like attached?
>>> This also has concurrent access of the power state ,
>>> vcn.inst[i].cur_state is not protected by workload_profile_mutex
>>>
>>> every where, it can still change while you are holding 
>>> workload_profile_mutex and checking it.
>>>
>>> Regards,
>>>
>>> Sathish
>>>
>>>> Thanks,
>>>> Lijo
>>>> -----Original Message-----
>>>> From: amd-gfx <[email protected]> On Behalf Of
>>>> Sundararaju, Sathishkumar
>>>> Sent: Thursday, August 14, 2025 4:43 AM
>>>> To: Alex Deucher <[email protected]>
>>>> Cc: Wu, David <[email protected]>; Deucher, Alexander
>>>> <[email protected]>; [email protected]
>>>> Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race
>>>> condition
>>>> (v3)
>>>>
>>>>
>>>> On 8/14/2025 3:38 AM, Alex Deucher wrote:
>>>>> On Wed, Aug 13, 2025 at 5:1 PM Sundararaju, Sathishkumar
>>>>> <[email protected]> wrote:
>>>>>> On 8/14/2025 2:33 AM, Alex Deucher wrote:
>>>>>>> On Wed, Aug 13, 2025 at 4:58 PM Sundararaju, Sathishkumar
>>>>>>> <[email protected]> wrote:
>>>>>>>> On 8/14/2025 1:35 AM, Alex Deucher wrote:
>>>>>>>>> On Wed, Aug 13, 2025 at 2:23 PM Sundararaju, Sathishkumar
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>> Hi Alex, Hi David,
>>>>>>>>>>
>>>>>>>>>> I see David's concern but his suggestion yet wont solve the
>>>>>>>>>> problem, neither the current form , reason :-
>>>>>>>>>>
>>>>>>>>>> The emitted fence count and total submission count are fast
>>>>>>>>>> transients which frequently become 0 in between video decodes
>>>>>>>>>> (between jobs) even with the atomics and locks there can be a
>>>>>>>>>> switch of video power profile, in the current form of patch
>>>>>>>>>> that window is minimized, but still can happen if stress
>>>>>>>>>> tested. But power state of any instance becoming zero
>>>>>>>>> Can you explain how this can happen?  I'm not seeing it.
>>>>>>>> Consider this situation, inst0 and inst1 actively decoding,
>>>>>>>> inst0 decode completes, delayed idle work starts.
>>>>>>>> inst0 idle handler can read 0 total fences and 0 total
>>>>>>>> submission count, even if inst1 is actively decoding, that's between 
>>>>>>>> the jobs,
>>>>>>>>         - as begin_use increaments vcn.total_submission_cnt and
>>>>>>>> end_use decreaments vcn.total_submission_cnt that can be 0.
>>>>>>>>         - if outstanding fences are cleared and no new emitted
>>>>>>>> fence, between jobs , can be 0.
>>>>>>>>         - both of the above conditions do not mean video decode
>>>>>>>> is complete on inst1, it is actively decoding.
>>>>>>> How can there be active decoding without an outstanding fence?
>>>>>>> In that case, total_fences (fences from both instances) would be non-0.
>>>>>> I mean on inst1 the job scheduled is already complete, so 0
>>>>>> outstanding fences, newer job is yet to be scheduled
>>>>>>
>>>>>> and commited to ring (inst1) , this doesn't mean decode has
>>>>>> stopped on
>>>>>> inst1 right (I am saying if timing of inst0 idle work coincides
>>>>>> with this),
>>>>>>
>>>>>> Or am I wrong in assuming this ? Can't this ever happen ? Please
>>>>>> correct my understanding here.
>>>>> The flow looks like:
>>>>>
>>>>> begin_use(inst)
>>>>> emit_fence(inst)
>>>>> end_use(inst)
>>>>>
>>>>> ...later
>>>>> fence signals
>>>>> ...later
>>>>> work handler
>>>>>
>>>>> In begin_use we increment the global and per instance submission.
>>>>> This protects the power gating and profile until end_use.  In end
>>>>> use we decrement the submissions because we don't need to protect
>>>>> anything any more as we have the fence that was submitted via the
>>>>> ring.  That fence won't signal until the job is complete.
>>>> Is a next begin_use always guaranteed to be run before current job fence 
>>>> signals ?
>>>>
>>>> if not then both total submission and total fence are zero , example
>>>> delayed job/packet submissions
>>>>
>>>> from user space, or next job schedule happens after current job fence 
>>>> signals.
>>>>
>>>> if this is never possible then (v3) is perfect.
>>>>
>>>> Regards,
>>>>
>>>> Sathish
>>>>
>>>>> For power gating, we
>>>>> only care about the submission count and fences for that instance,
>>>>> for the profile, we care about submission count and fences all instances.
>>>>> Once the fences have signalled, the outstanding fences will be 0
>>>>> and there won't be any active work.
>>>>>
>>>>> Alex
>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Sathish
>>>>>>
>>>>>>> Alex
>>>>>>>
>>>>>>>> Whereas if instances are powered off we are sure idle time is
>>>>>>>> past and it is powered off, no possible way of active video
>>>>>>>> decode, when all instances are off we can safely assume no
>>>>>>>> active decode and global lock protects it against new begin_use on any 
>>>>>>>> instance.
>>>>>>>> But the only distant concern is global common locks w.r.t perf,
>>>>>>>> but we are already having a global workprofile mutex , so there
>>>>>>>> shouldn't be any drop in perf, with just one single global lock
>>>>>>>> for all instances.
>>>>>>>>
>>>>>>>> Just sending out a patch with this fix, will leave it to you to
>>>>>>>> decide the right method. If you think outstanding total fences
>>>>>>>> can never be 0 during decode, then your previous version (v3)
>>>>>>>> itself is good, there is no real benefit of splitting the handlers as 
>>>>>>>> such.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Sathish
>>>>>>>>> If it is possible, maybe it would be easier to just split the
>>>>>>>>> profile and powergating into separate handlers.  The profile
>>>>>>>>> one would be global and the powergating one would be per instance.
>>>>>>>>> See the attached patches.
>>>>>>>>>
>>>>>>>>> Alex
>>>>>>>>>
>>>>>>>>>> can be a sure shot indication of break in a video decode, the
>>>>>>>>>> mistake in my patch was using per instance mutex, I should
>>>>>>>>>> have used a common global mutex, then that covers the situation 
>>>>>>>>>> David is trying to bring out.
>>>>>>>>>>
>>>>>>>>>> Using one global vcn.pg_lock for idle and begin_use and using
>>>>>>>>>> flags to track power state could help us totally avoid this 
>>>>>>>>>> situation.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Sathish
>>>>>>>>>>
>>>>>>>>>> On 8/13/2025 11:46 PM, Wu, David wrote:
>>>>>>>>>>> On 8/13/2025 12:51 PM, Alex Deucher wrote:
>>>>>>>>>>>> On Wed, Aug 13, 2025 at 12:39 PM Wu, David <[email protected]> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Hi Alex,
>>>>>>>>>>>>>
>>>>>>>>>>>>> The addition of  total_submission_cnt should work - in that
>>>>>>>>>>>>> it is unlikely to have a context switch right after the 
>>>>>>>>>>>>> begin_use().
>>>>>>>>>>>>> The suggestion of moving it inside the lock (which I prefer
>>>>>>>>>>>>> in case someone adds more before the lock and not reviewed
>>>>>>>>>>>>> thoroughly)
>>>>>>>>>>>>>           - up to you to decide.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Reviewed-by: David (Ming Qiang) Wu <[email protected]>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> David
>>>>>>>>>>>>> On 8/13/2025 9:45 AM, Alex Deucher wrote:
>>>>>>>>>>>>>> If there are multiple instances of the VCN running, we may
>>>>>>>>>>>>>> end up switching the video profile while another instance
>>>>>>>>>>>>>> is active because we only take into account the current
>>>>>>>>>>>>>> instance's submissions.  Look at all outstanding fences
>>>>>>>>>>>>>> for the video profile.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> v2: drop early exit in begin_use()
>>>>>>>>>>>>>> v3: handle possible race between begin_use() work handler
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Fixes: 3b669df92c85 ("drm/amdgpu/vcn: adjust workload
>>>>>>>>>>>>>> profile
>>>>>>>>>>>>>> handling")
>>>>>>>>>>>>>> Reviewed-by: Sathishkumar S
>>>>>>>>>>>>>> <[email protected]> (v1)
>>>>>>>>>>>>>> Signed-off-by: Alex Deucher <[email protected]>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>           drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 40
>>>>>>>>>>>>>> ++++++++++++-------------
>>>>>>>>>>>>>>           drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h |  1 +
>>>>>>>>>>>>>>           2 files changed, 21 insertions(+), 20
>>>>>>>>>>>>>> deletions(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>>>>>>>>>>>> index 9a76e11d1c184..593c1ddf8819b 100644
>>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>>>>>>>>>>>> @@ -415,19 +415,25 @@ static void
>>>>>>>>>>>>>> amdgpu_vcn_idle_work_handler(struct work_struct *work)
>>>>>>>>>>>>>>               struct amdgpu_vcn_inst *vcn_inst =
>>>>>>>>>>>>>>                       container_of(work, struct
>>>>>>>>>>>>>> amdgpu_vcn_inst, idle_work.work);
>>>>>>>>>>>>>>               struct amdgpu_device *adev = vcn_inst->adev;
>>>>>>>>>>>>>> -     unsigned int fences = 0, fence[AMDGPU_MAX_VCN_INSTANCES] = 
>>>>>>>>>>>>>> {0};
>>>>>>>>>>>>>> -     unsigned int i = vcn_inst->inst, j;
>>>>>>>>>>>>>> +     unsigned int total_fences = 0,
>>>>>>>>>>>>>> fence[AMDGPU_MAX_VCN_INSTANCES] = {0};
>>>>>>>>>>>>>> +     unsigned int i, j;
>>>>>>>>>>>>>>               int r = 0;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -     if (adev->vcn.harvest_config & (1 << i))
>>>>>>>>>>>>>> +     if (adev->vcn.harvest_config & (1 <<
>>>>>>>>>>>>>> + vcn_inst->inst))
>>>>>>>>>>>>>>                       return;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -     for (j = 0; j < adev->vcn.inst[i].num_enc_rings; ++j)
>>>>>>>>>>>>>> -             fence[i] +=
>>>>>>>>>>>>>> amdgpu_fence_count_emitted(&vcn_inst->ring_enc[j]);
>>>>>>>>>>>>>> +     for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
>>>>>>>>>>>>>> +             struct amdgpu_vcn_inst *v =
>>>>>>>>>>>>>> + &adev->vcn.inst[i];
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +             for (j = 0; j < v->num_enc_rings; ++j)
>>>>>>>>>>>>>> +                     fence[i] +=
>>>>>>>>>>>>>> amdgpu_fence_count_emitted(&v->ring_enc[j]);
>>>>>>>>>>>>>> +             fence[i] += 
>>>>>>>>>>>>>> amdgpu_fence_count_emitted(&v->ring_dec);
>>>>>>>>>>>>>> +             total_fences += fence[i];
>>>>>>>>>>>>>> +     }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>               /* Only set DPG pause for VCN3 or below, VCN4
>>>>>>>>>>>>>> and above will be handled by FW */
>>>>>>>>>>>>>>               if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG &&
>>>>>>>>>>>>>> -         !adev->vcn.inst[i].using_unified_queue) {
>>>>>>>>>>>>>> +         !vcn_inst->using_unified_queue) {
>>>>>>>>>>>>>>                       struct dpg_pause_state new_state;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                       if (fence[i] || @@ -436,18 +442,18 @@
>>>>>>>>>>>>>> static void amdgpu_vcn_idle_work_handler(struct
>>>>>>>>>>>>>> work_struct
>>>>>>>>>>>>>> *work)
>>>>>>>>>>>>>>                       else
>>>>>>>>>>>>>>                               new_state.fw_based =
>>>>>>>>>>>>>> VCN_DPG_STATE__UNPAUSE;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -             adev->vcn.inst[i].pause_dpg_mode(vcn_inst, 
>>>>>>>>>>>>>> &new_state);
>>>>>>>>>>>>>> +             vcn_inst->pause_dpg_mode(vcn_inst,
>>>>>>>>>>>>>> + &new_state);
>>>>>>>>>>>>>>               }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -     fence[i] += 
>>>>>>>>>>>>>> amdgpu_fence_count_emitted(&vcn_inst->ring_dec);
>>>>>>>>>>>>>> -     fences += fence[i];
>>>>>>>>>>>>>> -
>>>>>>>>>>>>>> -     if (!fences && 
>>>>>>>>>>>>>> !atomic_read(&vcn_inst->total_submission_cnt)) {
>>>>>>>>>>>>>> +     if (!fence[vcn_inst->inst] &&
>>>>>>>>>>>>>> !atomic_read(&vcn_inst->total_submission_cnt)) {
>>>>>>>>>>>>>> +             /* This is specific to this instance */
>>>>>>>>>>>>>>                       mutex_lock(&vcn_inst->vcn_pg_lock);
>>>>>>>>>>>>>>                       vcn_inst->set_pg_state(vcn_inst, 
>>>>>>>>>>>>>> AMD_PG_STATE_GATE);
>>>>>>>>>>>>>>                       mutex_unlock(&vcn_inst->vcn_pg_lock);
>>>>>>>>>>>>>> mutex_lock(&adev->vcn.workload_profile_mutex);
>>>>>>>>>>>>>> -             if (adev->vcn.workload_profile_active) {
>>>>>>>>>>>>>> +             /* This is global and depends on all VCN instances 
>>>>>>>>>>>>>> */
>>>>>>>>>>>>>> +             if (adev->vcn.workload_profile_active &&
>>>>>>>>>>>>>> !total_fences &&
>>>>>>>>>>>>>> + !atomic_read(&adev->vcn.total_submission_cnt)) {
>>>>>>>>>>>>>>                               r =
>>>>>>>>>>>>>> amdgpu_dpm_switch_power_profile(adev,
>>>>>>>>>>>>>> PP_SMC_POWER_PROFILE_VIDEO, false);
>>>>>>>>>>>>>>                               if (r) @@ -467,16 +473,10 @@
>>>>>>>>>>>>>> void amdgpu_vcn_ring_begin_use(struct amdgpu_ring *ring)
>>>>>>>>>>>>>>               int r = 0;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>               atomic_inc(&vcn_inst->total_submission_cnt);
>>>>>>>>>>>>>> +     atomic_inc(&adev->vcn.total_submission_cnt);
>>>>>>>>>>>>> move this addition down inside the mutex lock
>>>>>>>>>>>>>> cancel_delayed_work_sync(&vcn_inst->idle_work);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -     /* We can safely return early here because we've cancelled 
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> -      * the delayed work so there is no one else to set it to 
>>>>>>>>>>>>>> false
>>>>>>>>>>>>>> -      * and we don't care if someone else sets it to true.
>>>>>>>>>>>>>> -      */
>>>>>>>>>>>>>> -     if (adev->vcn.workload_profile_active)
>>>>>>>>>>>>>> -             goto pg_lock;
>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> mutex_lock(&adev->vcn.workload_profile_mutex);
>>>>>>>>>>>>> move to here:
>>>>>>>>>>>>> atomic_inc(&adev->vcn.total_submission_cnt);
>>>>>>>>>>>>> I think this should work for multiple instances.
>>>>>>>>>>>> Why does this need to be protected by the mutex?
>>>>>>>>>>> hmm.. OK - no need and it is actually better before the mutex.
>>>>>>>>>>> David
>>>>>>>>>>>> Alex
>>>>>>>>>>>>
>>>>>>>>>>>>> David
>>>>>>>>>>>>>>               if (!adev->vcn.workload_profile_active) {
>>>>>>>>>>>>>>                       r =
>>>>>>>>>>>>>> amdgpu_dpm_switch_power_profile(adev,
>>>>>>>>>>>>>> PP_SMC_POWER_PROFILE_VIDEO, @@ -487,7 +487,6 @@ void
>>>>>>>>>>>>>> amdgpu_vcn_ring_begin_use(struct amdgpu_ring *ring)
>>>>>>>>>>>>>>               }
>>>>>>>>>>>>>> mutex_unlock(&adev->vcn.workload_profile_mutex);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -pg_lock:
>>>>>>>>>>>>>>               mutex_lock(&vcn_inst->vcn_pg_lock);
>>>>>>>>>>>>>>               vcn_inst->set_pg_state(vcn_inst,
>>>>>>>>>>>>>> AMD_PG_STATE_UNGATE);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @@ -528,6 +527,7 @@ void amdgpu_vcn_ring_end_use(struct
>>>>>>>>>>>>>> amdgpu_ring
>>>>>>>>>>>>>> *ring)
>>>>>>>>>>>>>> atomic_dec(&ring->adev->vcn.inst[ring->me].dpg_enc_submiss
>>>>>>>>>>>>>> i
>>>>>>>>>>>>>> o
>>>>>>>>>>>>>> n
>>>>>>>>>>>>>> _cnt);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> atomic_dec(&ring->adev->vcn.inst[ring->me].total_submissio
>>>>>>>>>>>>>> n
>>>>>>>>>>>>>> _
>>>>>>>>>>>>>> c
>>>>>>>>>>>>>> nt);
>>>>>>>>>>>>>> + atomic_dec(&ring->adev->vcn.total_submission_cnt);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> schedule_delayed_work(&ring->adev->vcn.inst[ring->me].idle_work,
>>>>>>>>>>>>>>                                     VCN_IDLE_TIMEOUT); diff
>>>>>>>>>>>>>> --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
>>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
>>>>>>>>>>>>>> index b3fb1d0e43fc9..febc3ce8641ff 100644
>>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
>>>>>>>>>>>>>> @@ -352,6 +352,7 @@ struct amdgpu_vcn {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>               uint16_t inst_mask;
>>>>>>>>>>>>>>               uint8_t num_inst_per_aid;
>>>>>>>>>>>>>> +     atomic_t                total_submission_cnt;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>               /* IP reg dump */
>>>>>>>>>>>>>>               uint32_t                *ip_dump;

Reply via email to