On 16.10.25 19:20, Philipp Stanner wrote:
> On Thu, 2025-10-16 at 15:11 +0200, Christian König wrote:
>> On 16.10.25 14:31, Philipp Stanner wrote:
>>> On Wed, 2025-10-15 at 16:01 +0200, Christian König wrote:
>>>> From: David Rosca <[email protected]>
>>>>
>>>> The DRM scheduler tracks who last uses an entity and when that process
>>>> is killed blocks all further submissions to that entity.
>>>>
>>>> The problem is that we didn't track who initially created an entity, so
>>>> when a process accidently leaked its file descriptor to a child and
>>>> that child got killed, we killed the parent's entities.
>>>>
>>>> Avoid that and instead initialize the entities last user on entity
>>>> creation. This also allows to drop the extra NULL check.
>>>>
>>>> v2: still use cmpxchg
>>>> v3: improve the commit message
>>>
>>> For the future, commit messages in the patche's comment body are to be
>>> preferred since it's common kernel style. Same applies to the patch
>>> version in the title, which should be in [PATCH v3].
>>
>> Ah, just forgotten about it!
>>
>>>
>>> But that's just a nit. More important:
>>>
>>>>
>>>> Signed-off-by: David Rosca <[email protected]>
>>>> Signed-off-by: Christian König <[email protected]>
>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4568
>>>
>>> Should this have a Fixes: ?
>>
>> No, I've actually removed that because the patch which made it obvious that 
>> something is wrong here is correct.
>>
>> It's just that this seems to be incorrect ever since we added the code.
> 
> Then we should just add the Fixes: tag for the big bang commit,
> shouldn't we?
> 
> At least maintainer-tools/dim doesn't like the missing tag at all. When
> trying to apply this patch it just added the following:
> 
> Signed-off-by: David Rosca <[email protected]>
> Signed-off-by: Christian König <[email protected]>
> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4568
> Reviewed-by: Alex Deucher <[email protected]>
> CC: [email protected]
> Fixes: 43bce41cf48e ("drm/scheduler: only kill entity if last user is  killed 
> v2")
> Signed-off-by: Philipp Stanner <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> 
> and then it complains about the tag it added itself:
> 
> Applying: drm/sched: avoid killing parent entity on child SIGKILL v3
> [drm-misc-fixes 2a3e82c80bd0] drm/sched: avoid killing parent entity on child 
> SIGKILL v3
>  Author: David Rosca <[email protected]>
>  Date: Wed Oct 15 16:01:28 2025 +0200
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 2a3e82c80bd0 (HEAD -> drm-misc-fixes) drm/sched: avoid killing parent entity 
> on child SIGKILL v3
> -:27: WARNING:BAD_FIXES_TAG: Please use correct Fixes: style 'Fixes: <12+ 
> chars of sha1> ("<title line>")' - ie: 'Fixes: 43bce41cf48e ("drm/scheduler: 
> only kill entity if last user is killed v2")'
> #27: 
> Fixes: 43bce41cf48e ("drm/scheduler: only kill entity if last user is  killed 
> v2")
> 
> -:48: CHECK:UNNECESSARY_PARENTHESES: Unnecessary parentheses around 
> 'current->exit_code == SIGKILL'
> #48: FILE: drivers/gpu/drm/scheduler/sched_entity.c:306:
> +     if (last_user == current->group_leader &&
>           (current->flags & PF_EXITING) && (current->exit_code == SIGKILL))
> 
> total: 0 errors, 1 warnings, 1 checks, 15 lines checked
> 
> 
> Which is weird..
> 
> in any case the big bang commit helps stable to apply this correctly,
> too.

Last time I talked with Greg about it he mentioned that the Fixes tag just 
makes sure that this fix is applied everywhere the original patch is applied as 
well.

Since the original patch is so old CC: stable should be sufficient as far as I 
can see.

That dim complains about missing fixes tags is news to me.

Regards,
Christian.

> 
> P.
> 
>>
>>>
>>>> Reviewed-by: Alex Deucher <[email protected]>
>>>> CC: [email protected]
>>>
>>> So we want it in drm-misc-fixes, don't we?
>>
>> Yes, the patch is based on drm-misc-fixes. I can push it when you give me an 
>> rb.
>>
>> Alternatively you can push it yourself, whatever you prefer.
>>
>> Regards,
>> Christian.
>>
>>>
>>>
>>> P.
>>>
>>>> ---
>>>>  drivers/gpu/drm/scheduler/sched_entity.c | 3 ++-
>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
>>>> b/drivers/gpu/drm/scheduler/sched_entity.c
>>>> index 5a4697f636f2..3e2f83dc3f24 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>> @@ -70,6 +70,7 @@ int drm_sched_entity_init(struct drm_sched_entity 
>>>> *entity,
>>>>    entity->guilty = guilty;
>>>>    entity->num_sched_list = num_sched_list;
>>>>    entity->priority = priority;
>>>> +  entity->last_user = current->group_leader;
>>>>    /*
>>>>     * It's perfectly valid to initialize an entity without having a valid
>>>>     * scheduler attached. It's just not valid to use the scheduler before 
>>>> it
>>>> @@ -302,7 +303,7 @@ long drm_sched_entity_flush(struct drm_sched_entity 
>>>> *entity, long timeout)
>>>>  
>>>>    /* For a killed process disallow further enqueueing of jobs. */
>>>>    last_user = cmpxchg(&entity->last_user, current->group_leader, NULL);
>>>> -  if ((!last_user || last_user == current->group_leader) &&
>>>> +  if (last_user == current->group_leader &&
>>>>        (current->flags & PF_EXITING) && (current->exit_code == SIGKILL))
>>>>            drm_sched_entity_kill(entity);
>>>>  
>>>
>>
> 

Reply via email to