Re: [PATCH] drm/sched: Avoid double re-lock on the job free path

2025-07-18 Thread Philipp Stanner
On Fri, 2025-07-18 at 10:35 +0100, Tvrtko Ursulin wrote: > > On 18/07/2025 10:31, Philipp Stanner wrote: > > On Fri, 2025-07-18 at 08:13 +0100, Tvrtko Ursulin wrote: > > > > > > On 16/07/2025 21:44, Maíra Canal wrote: > > > > Hi Tvrtko, > > >

Re: [PATCH] drm/sched: Avoid double re-lock on the job free path

2025-07-18 Thread Philipp Stanner
gt; > > > in the > > > > > > > queue we can simply add the signaled check and have it return the > > > > > > > presence > > > > > > > of more jobs to be freed to the caller. That way the work item > > > > > > &g

Re: [PATCH] drm/sched: Prevent stopped entities from being added to the run queue.

2025-07-22 Thread Philipp Stanner
On Mon, 2025-07-21 at 11:07 -0700, Matthew Brost wrote: > On Mon, Jul 21, 2025 at 12:14:31PM +0200, Danilo Krummrich wrote: > > On Mon Jul 21, 2025 at 10:16 AM CEST, Philipp Stanner wrote: > > > On Mon, 2025-07-21 at 09:52 +0200, Philipp Stanner wrote: > > > > On

Re: [PATCH] drm/scheduler: signal scheduled fence when kill job

2025-05-15 Thread Philipp Stanner
ssue simply that the fence might be dropped unsignaled, being a bug by definition? Needs to be written down. Grammar is also a bit too broken. And running the unit tests before pushing is probably also a good idea. > > > > Signed-off-by: Lin.Cao Acked-by: Philipp Stanner > > Revie

Re: [PATCH v4 04/40] drm/sched: Add enqueue credit limit

2025-05-15 Thread Philipp Stanner
Hello, On Wed, 2025-05-14 at 09:59 -0700, Rob Clark wrote: > From: Rob Clark > > Similar to the existing credit limit mechanism, but applying to jobs > enqueued to the scheduler but not yet run. > > The use case is to put an upper bound on preallocated, and > potentially > unneeded, pgtable pag

Re: [PATCH 1/3] drm/sched: add drm_sched_prealloc_dependency_slots v3

2025-05-16 Thread Philipp Stanner
On Thu, 2025-05-15 at 17:17 +0100, Tvrtko Ursulin wrote: > > On 15/05/2025 16:00, Christian König wrote: > > Sometimes drivers need to be able to submit multiple jobs which > > depend on > > each other to different schedulers at the same time, but using > > drm_sched_job_add_dependency() can't fai

Re: [PATCH] drm/scheduler: signal scheduled fence when kill job

2025-05-16 Thread Philipp Stanner
that will never be resolved. Fix this issue by ensuring > that   > scheduled fences are properly signaled when an entity is killed, > allowing   > dependent applications to continue execution. That sounds perfect, yes, Thx. Reviewed-by: Philipp Stanner P. > > Thanks, >

Re: [PATCH v3 1/5] drm/sched: Fix teardown leaks with waitqueue

2025-05-22 Thread Philipp Stanner
On Thu, 2025-05-22 at 14:37 +0100, Tvrtko Ursulin wrote: > > On 22/05/2025 09:27, Philipp Stanner wrote: > > From: Philipp Stanner > > > > The GPU scheduler currently does not ensure that its pending_list > > is > > empty before performing various other

Re: [PATCH v3 2/5] drm/sched/tests: Port tests to new cleanup method

2025-05-22 Thread Philipp Stanner
On Thu, 2025-05-22 at 15:06 +0100, Tvrtko Ursulin wrote: > > On 22/05/2025 09:27, Philipp Stanner wrote: > > The drm_gpu_scheduler now supports a callback to help > > drm_sched_fini() > > avoid memory leaks. This callback instructs the driver to signal > > a

Re: [PATCH 2/2] drm/nouveau: Don't signal when killing the fence context

2025-05-22 Thread Philipp Stanner
On Thu, 2025-05-22 at 15:09 +0200, Christian König wrote: > On 5/22/25 14:59, Danilo Krummrich wrote: > > On Thu, May 22, 2025 at 02:34:33PM +0200, Christian König wrote: > > > See all the functions inside include/linux/dma-fence.h can be > > > used by everybody. It's basically the public interface

Re: [PATCH 2/2] drm/nouveau: Don't signal when killing the fence context

2025-05-22 Thread Philipp Stanner
On Thu, 2025-05-22 at 15:24 +0200, Christian König wrote: > On 5/22/25 15:16, Philipp Stanner wrote: > > On Thu, 2025-05-22 at 15:09 +0200, Christian König wrote: > > > On 5/22/25 14:59, Danilo Krummrich wrote: > > > > On Thu, May 22, 2025 at 02:34:33PM +0200,

Re: [PATCH 2/2] drm/nouveau: Don't signal when killing the fence context

2025-05-22 Thread Philipp Stanner
On Thu, 2025-05-22 at 14:34 +0200, Christian König wrote: > On 5/22/25 14:20, Philipp Stanner wrote: > > On Thu, 2025-05-22 at 14:06 +0200, Christian König wrote: > > > On 5/22/25 13:25, Philipp Stanner wrote: > > > > dma_fence_is_signa

Re: [PATCH] drm/sched/tests: Use one lock for fence context

2025-05-22 Thread Philipp Stanner
On Wed, 2025-05-21 at 11:24 +0100, Tvrtko Ursulin wrote: > > On 21/05/2025 11:04, Philipp Stanner wrote: > > When the unit tests were implemented, each scheduler job got its > > own, > > distinct lock. This is not how dma_fence context locking rules are > > t

[PATCH v3 1/5] drm/sched: Fix teardown leaks with waitqueue

2025-05-22 Thread Philipp Stanner
From: Philipp Stanner The GPU scheduler currently does not ensure that its pending_list is empty before performing various other teardown tasks in drm_sched_fini(). If there are still jobs in the pending_list, this is problematic because after scheduler teardown, no one will call

[PATCH v3 2/5] drm/sched/tests: Port tests to new cleanup method

2025-05-22 Thread Philipp Stanner
a new error field for the fence error. Keep the job status as DRM_MOCK_SCHED_JOB_DONE for now, since there is no party for which checking for a CANCELED status would be useful currently. Signed-off-by: Philipp Stanner --- .../gpu/drm/scheduler/tests/mock_scheduler.c | 67

[PATCH v3 3/5] drm/sched: Warn if pending list is not empty

2025-05-22 Thread Philipp Stanner
drm_sched_fini() can leak jobs under certain circumstances. Warn if that happens. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_main.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c

[PATCH v3 5/5] drm/nouveau: Remove waitque for sched teardown

2025-05-22 Thread Philipp Stanner
nouveau_sched_fence_context_kill() the waitque is not necessary anymore. Remove the waitque. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/nouveau/nouveau_sched.c | 20 +++- drivers/gpu/drm/nouveau/nouveau_sched.h | 9 +++-- drivers/gpu/drm/nouveau/nouveau_uvmm.c | 8 3 files

[PATCH v3 0/5] Fix memory leaks in drm_sched_fini()

2025-05-22 Thread Philipp Stanner
ovide users with a more reliable, clean scheduler API. Philipp Philipp Stanner (5): drm/sched: Fix teardown leaks with waitqueue drm/sched/tests: Port tests to new cleanup method drm/sched: Warn if pending list is not empty drm/nouveau: Add new callback for scheduler teardown drm/nouveau: Remove

[PATCH v3 4/5] drm/nouveau: Add new callback for scheduler teardown

2025-05-22 Thread Philipp Stanner
There is a new callback for always tearing the scheduler down in a leak-free, deadlock-free manner. Port Nouveau as its first user by providing the scheduler with a callback that ensures the fence context gets killed in drm_sched_fini(). Signed-off-by: Philipp Stanner --- drivers/gpu/drm

[PATCH 1/2] dma-buf: Add __dma_fence_is_signaled()

2025-05-22 Thread Philipp Stanner
ed. Use it internally. Suggested-by: Tvrtko Ursulin Signed-off-by: Philipp Stanner --- include/linux/dma-fence.h | 24 ++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index 48b5202c531d..ac951a54a007 10

[PATCH 2/2] drm/nouveau: Don't signal when killing the fence context

2025-05-22 Thread Philipp Stanner
which only checks, never signals. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/nouveau/nouveau_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index d5654e26d5bc..993b3dcb5db0

Re: [PATCH 2/2] drm/nouveau: Don't signal when killing the fence context

2025-05-22 Thread Philipp Stanner
On Thu, 2025-05-22 at 14:06 +0200, Christian König wrote: > On 5/22/25 13:25, Philipp Stanner wrote: > > dma_fence_is_signaled_locked(), which is used in > > nouveau_fence_context_kill(), can signal fences below the surface > > through a callback. > > > > The

Re: [PATCH v2] drm/sched/tests: Use one lock for fence context

2025-06-02 Thread Philipp Stanner
On Tue, 2025-05-27 at 12:10 +0200, Philipp Stanner wrote: > There is no need for separate locks for single jobs and the entire > scheduler. The dma_fence context can be protected by the scheduler > lock, > allowing for removing the jobs' locks. This simplifies things and > re

Re: [PATCH v2 3/8] drm/sched: Reduce scheduler's timeout for timeout tests

2025-06-02 Thread Philipp Stanner
I'd call that patch sth like "Make timeout unit tests faster". Makes more obvious what it's about. P. On Fri, 2025-05-30 at 11:01 -0300, Maíra Canal wrote: > As more KUnit tests are introduced to evaluate the basic capabilities > of > the `timedout_job()` hook, the test suite will continue to inc

Re: [PATCH v2 6/8] drm/etnaviv: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset

2025-06-02 Thread Philipp Stanner
On Mon, 2025-06-02 at 08:36 -0300, Maíra Canal wrote: > Hi Philipp, > > On 02/06/25 04:28, Philipp Stanner wrote: > > On Fri, 2025-05-30 at 11:01 -0300, Maíra Canal wrote: > > [...] > > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c > > >

Re: [PATCH] drm/etnaviv: Protect the scheduler's pending list with its lock

2025-06-02 Thread Philipp Stanner
es: 704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is > still active") Could also contain a "Closes: " with the link to the appropriate message from thread [1] from below. You might also include "Reported-by: Philipp" since I technically first describ

Re: [PATCH v2 2/8] drm/sched: Allow drivers to skip the reset and keep on running

2025-06-02 Thread Philipp Stanner
Hi, thx for the update. Seems to be developing nicely. Some comments below. On Fri, 2025-05-30 at 11:01 -0300, Maíra Canal wrote: > When the DRM scheduler times out, it's possible that the GPU isn't > hung; > instead, a job may still be running, and there may be no valid reason > to > reset the h

Re: [PATCH v2 5/8] drm/v3d: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset

2025-06-02 Thread Philipp Stanner
On Fri, 2025-05-30 at 11:01 -0300, Maíra Canal wrote: > When a CL/CSD job times out, we check if the GPU has made any > progress > since the last timeout. If so, instead of resetting the hardware, we > skip > the reset and allow the timer to be rearmed. This gives long-running > jobs > a chance to

Re: [PATCH v2 6/8] drm/etnaviv: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset

2025-06-02 Thread Philipp Stanner
On Fri, 2025-05-30 at 11:01 -0300, Maíra Canal wrote: > Etnaviv can skip a hardware reset in two situations: > >   1. TDR has fired before the free-job worker and the timeout is > spurious. >   2. The GPU is still making progress on the front-end and we can > give > the job a chance to comple

Re: [PATCH v2 7/8] drm/xe: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset

2025-06-02 Thread Philipp Stanner
On Fri, 2025-05-30 at 11:01 -0300, Maíra Canal wrote: > Xe can skip the reset if TDR has fired before the free job worker and > can > also re-arm the timeout timer in some scenarios. Instead of using the > scheduler internals to add the job to the pending list, use the > DRM_GPU_SCHED_STAT_NO_HANG

Re: [RFC PATCH 0/6] drm/sched: Avoid memory leaks by canceling job-by-job

2025-06-03 Thread Philipp Stanner
On Tue, 2025-06-03 at 13:27 +0100, Tvrtko Ursulin wrote: > > On 03/06/2025 10:31, Philipp Stanner wrote: > > An alternative version to [1], based on Tvrtko's suggestion from > > [2]. > > > > I tested this for Nouveau. Works. > > > > I'm having

[RFC PATCH 1/6] drm/sched: Avoid memory leaks with cancel_job() callback

2025-06-03 Thread Philipp Stanner
the hardware fence associated with the job. Afterwards, the scheduler can savely use the established free_job() callback for freeing the job. Implement the new backend_ops callback cancel_job(). Suggested-by: Tvrtko Ursulin Signed-off-by: Philipp Stanner --- drivers/gpu/drm/scheduler

[RFC PATCH 3/6] drm/sched: Warn if pending list is not empty

2025-06-03 Thread Philipp Stanner
drm_sched_fini() can leak jobs under certain circumstances. Warn if that happens. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_main.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c

[RFC PATCH 6/6] drm/nouveau: Remove waitque for sched teardown

2025-06-03 Thread Philipp Stanner
nouveau_sched_fence_context_kill() the waitque is not necessary anymore. Remove the waitque. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/nouveau/nouveau_sched.c | 20 +++- drivers/gpu/drm/nouveau/nouveau_sched.h | 9 +++-- drivers/gpu/drm/nouveau/nouveau_uvmm.c | 8 3 files

Re: [PATCH] drm/sched: Discourage usage of separate workqueues

2025-06-05 Thread Philipp Stanner
On Wed, 2025-06-04 at 17:07 +0200, Simona Vetter wrote: > On Wed, Jun 04, 2025 at 11:41:25AM +0200, Christian König wrote: > > On 6/4/25 10:16, Philipp Stanner wrote: > > > struct drm_sched_init_args provides the possibility of letting > > > the > > > sche

Re: [PATCH] drm/sched/tests: Make timedout_job callback a better role model

2025-06-23 Thread Philipp Stanner
On Thu, 2025-06-05 at 15:41 +0200, Philipp Stanner wrote: > Since the drm_mock_scheduler does not have real users in userspace, > nor > does it have real hardware or firmware rings, it's not necessary to > signal timedout fences nor free jobs - from a functional standpoint. &g

Re: [PATCH v3 7/8] drm/xe: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset

2025-06-23 Thread Philipp Stanner
On Wed, 2025-06-18 at 11:47 -0300, Maíra Canal wrote: > Xe can skip the reset if TDR has fired before the free job worker and > can > also re-arm the timeout timer in some scenarios. Instead of > manipulating > scheduler's internals, inform the scheduler that the job did not > actually > timeout an

Re: [RFC PATCH 1/6] drm/sched: Avoid memory leaks with cancel_job() callback

2025-06-16 Thread Philipp Stanner
On Mon, 2025-06-16 at 10:27 +0100, Tvrtko Ursulin wrote: > > On 12/06/2025 15:20, Philipp Stanner wrote: > > On Thu, 2025-06-12 at 15:17 +0100, Tvrtko Ursulin wrote: > > > > > > On 03/06/2025 10:31, Philipp Stanner wrote: > > > > Since its inception

Re: [PATCH] drm/sched/tests: Make timedout_job callback a better role model

2025-06-16 Thread Philipp Stanner
On Mon, 2025-06-16 at 09:49 -0300, Maíra Canal wrote: > Hi Danilo, > > On 16/06/25 08:14, Danilo Krummrich wrote: > > On Mon, Jun 16, 2025 at 11:57:47AM +0100, Tvrtko Ursulin wrote: > > > Code looks fine, but currently nothing is broken and I disagree > > > with the > > > goal that the _mock_^1 co

[PATCH] drm/sched/tests: Make timedout_job callback a better role model

2025-06-05 Thread Philipp Stanner
r new scheduler users. Therefore, they should approximate the canonical usage as much as possible. Make sure timed out hardware fences get signaled with the appropriate error code. Signed-off-by: Philipp Stanner --- .../gpu/drm/scheduler/tests/mock_scheduler.c | 26 ++- 1

Re: [PATCH v1] drm/amdgpu: give each kernel job a unique id

2025-06-13 Thread Philipp Stanner
On Fri, 2025-06-13 at 10:23 +0200, Christian König wrote: > On 6/13/25 01:48, Danilo Krummrich wrote: > > On Thu, Jun 12, 2025 at 09:00:34AM +0200, Christian König wrote: > > > On 6/11/25 17:11, Danilo Krummrich wrote: > > > > > > > Mhm, reiterating our internal discussion on the mailing > > > > >

Re: [PATCH] drm/sched: Prevent stopped entities from being added to the run queue.

2025-07-23 Thread Philipp Stanner
Hello, On Tue, 2025-07-22 at 13:05 -0700, James wrote: > On Mon, Jul 21, 2025, at 1:16 AM, Philipp Stanner wrote: > > On Mon, 2025-07-21 at 09:52 +0200, Philipp Stanner wrote: > > > +Cc Tvrtko, who's currently reworking FIFO and RR. > > > > > > On Sun,

Re: [PATCH] drm/sched: Prevent stopped entities from being added to the run queue.

2025-07-21 Thread Philipp Stanner
+Cc Tvrtko, who's currently reworking FIFO and RR. On Sun, 2025-07-20 at 16:56 -0700, James Flowers wrote: > Fixes an issue where entities are added to the run queue in > drm_sched_rq_update_fifo_locked after being killed, causing a > slab-use-after-free error. > > Signed-off-by: James Flowers >

Re: [PATCH] drm/sched: Prevent stopped entities from being added to the run queue.

2025-07-21 Thread Philipp Stanner
On Mon, 2025-07-21 at 09:52 +0200, Philipp Stanner wrote: > +Cc Tvrtko, who's currently reworking FIFO and RR. > > On Sun, 2025-07-20 at 16:56 -0700, James Flowers wrote: > > Fixes an issue where entities are added to the run queue in > > drm_sched_rq_update_fifo

[PATCH] drm/sched: Extend and update documentation

2025-07-24 Thread Philipp Stanner
From: Philipp Stanner The various objects and their memory lifetime used by the GPU scheduler are currently not fully documented. Add documentation describing the scheduler's objects. Improve the general documentation at a few other places. Co-developed-by: Christian König Signed-o

Re: [PATCH] drm/sched: Extend and update documentation

2025-07-24 Thread Philipp Stanner
Two comments from myself to open up room for discussion: On Thu, 2025-07-24 at 16:01 +0200, Philipp Stanner wrote: > From: Philipp Stanner > > The various objects and their memory lifetime used by the GPU scheduler > are currently not fully documented. > > Add documentat

Re: [PATCH] drm/nouveau: Remove surplus struct member

2025-08-01 Thread Philipp Stanner
On Fri, 2025-08-01 at 15:42 +, Timur Tabi wrote: > On Fri, 2025-08-01 at 17:12 +0200, Danilo Krummrich wrote: > > On Fri Aug 1, 2025 at 4:50 PM CEST, Timur Tabi wrote: > > > Does mean that the TODO has been done, or that someone completely forgot > > > and now your patch > > > is > > > remove

Re: [PATCH] drm/sched: Extend and update documentation

2025-08-05 Thread Philipp Stanner
On Tue, 2025-08-05 at 11:05 +0200, Christian König wrote: > On 24.07.25 17:07, Philipp Stanner wrote: > > > +/** > > > + * DOC: Scheduler Fence Object > > > + * > > > + * The scheduler fence object (&struct drm_sched_fence) encapsulates the > > >

[PATCH] drm/nouveau: Remove surplus struct member

2025-08-01 Thread Philipp Stanner
struct nouveau_channel contains the member 'accel_done' and a forgotten TODO which hints at that mechanism being removed in the "near future". Since that variable is read nowhere anymore, this "near future" is now. Remove the variable and the TODO. Signed-off-by:

[PATCH] drm/sched: Document race condition in drm_sched_fini()

2025-07-31 Thread Philipp Stanner
associated with a scheduler must be torn down first. Then, however, the locking should be removed from drm_sched_fini() alltogether with an appropriate comment. Reported-by: James Flowers Link: https://lore.kernel.org/dri-devel/20250720235748.2798-1-bold.zone2...@fastmail.com/ Signed-off-by: Philipp

Re: [PATCH] drm/sched: Prevent stopped entities from being added to the run queue.

2025-07-22 Thread Philipp Stanner
On Tue, 2025-07-22 at 01:45 -0700, Matthew Brost wrote: > On Tue, Jul 22, 2025 at 01:07:29AM -0700, Matthew Brost wrote: > > On Tue, Jul 22, 2025 at 09:37:11AM +0200, Philipp Stanner wrote: > > > On Mon, 2025-07-21 at 11:07 -0700, Matthew Brost wrote: > > > > On M

Re: [RFC v7 10/12] drm/sched: Break submission patterns with some randomness

2025-07-30 Thread Philipp Stanner
gt; > loosely called random. Under the assumption it will not always be the > > > same > > > entity which is re-joining the queue under these circumstances. > > > > > > Another way to look at this is that it is adding a little bit of limited > > > random

Re: [PATCH 1/8] drm/sched: Allow drivers to skip the reset and keep on running

2025-05-13 Thread Philipp Stanner
On Sat, 2025-05-03 at 17:59 -0300, Maíra Canal wrote: > When the DRM scheduler times out, it's possible that the GPU isn't > hung; > instead, a job may still be running, and there may be no valid reason > to > reset the hardware. This can occur in two situations: > >   1. The GPU exposes some mech

Re: [PATCH] drm/cirrus: Use non-hybrid PCI devres API

2025-05-09 Thread Philipp Stanner
On Thu, 2025-05-08 at 12:44 +0200, Javier Martinez Canillas wrote: > Philipp Stanner writes: > > Hello Philipp, > > > On Tue, 2025-04-22 at 23:51 +0200, Javier Martinez Canillas wrote: > > > Philipp Stanner writes: > > > > > > Hello Philipp, >

Re: [PATCH v2] drm/vmgfx: Use non-hybrid PCI devres API

2025-05-08 Thread Philipp Stanner
On Thu, 2025-05-08 at 11:39 -0400, Zack Rusin wrote: > On Thu, May 8, 2025 at 6:40 AM Philipp Stanner > wrote: > > > > On Wed, 2025-04-23 at 14:06 +0200, Philipp Stanner wrote: > > > vmgfx enables its PCI device with pcim_enable_device(). This, > > &g

Re: [PATCH 1/8] drm/sched: Allow drivers to skip the reset and keep on running

2025-05-12 Thread Philipp Stanner
On Tue, 2025-05-06 at 07:32 -0700, Matthew Brost wrote: > On Mon, May 05, 2025 at 07:41:09PM -0700, Matthew Brost wrote: > > On Sat, May 03, 2025 at 05:59:52PM -0300, Maíra Canal wrote: > > > When the DRM scheduler times out, it's possible that the GPU > > > isn't hung; > > > instead, a job may sti

Re: [RFC v4 04/16] drm/sched: Avoid double re-lock on the job free path

2025-05-12 Thread Philipp Stanner
heduling policy, not general other improvements. P. > > Signed-off-by: Tvrtko Ursulin > Cc: Christian König > Cc: Danilo Krummrich > Cc: Matthew Brost > Cc: Philipp Stanner > --- >  drivers/gpu/drm/scheduler/sched_main.c | 39 +++- > -- >  1

Re: [RFC v4 05/16] drm/sched: Consolidate drm_sched_job_timedout

2025-05-12 Thread Philipp Stanner
he function. Same here, that's a good candidate for a separate patch / series. P. > > Signed-off-by: Tvrtko Ursulin > Cc: Christian König > Cc: Danilo Krummrich > Cc: Matthew Brost > Cc: Philipp Stanner > --- >  drivers/gpu/drm/scheduler/sched_main.c | 37 +++

Re: [RFC v4 10/16] drm/sched: Free all finished jobs at once

2025-05-12 Thread Philipp Stanner
gt; completed jobs as soon as possible so the metric is most up to date > when > view from the submission side of things. > > Signed-off-by: Tvrtko Ursulin > Cc: Christian König > Cc: Danilo Krummrich > Cc: Matthew Brost > Cc: Philipp Stanner > --- &

Re: [PATCH v2 2/6] drm/sched: Prevent teardown waitque from blocking too long

2025-05-16 Thread Philipp Stanner
On Fri, 2025-05-16 at 10:33 +0100, Tvrtko Ursulin wrote: > > On 24/04/2025 10:55, Philipp Stanner wrote: > > The waitqueue that ensures that drm_sched_fini() blocks until the > > pending_list has become empty could theoretically cause that > > function to > > bl

Re: [PATCH 1/3] drm/sched: add drm_sched_prealloc_dependency_slots v3

2025-05-16 Thread Philipp Stanner
On Fri, 2025-05-16 at 13:10 +0100, Tvrtko Ursulin wrote: > > On 16/05/2025 12:53, Tvrtko Ursulin wrote: > > > > On 16/05/2025 08:28, Philipp Stanner wrote: > > > On Thu, 2025-05-15 at 17:17 +0100, Tvrtko Ursulin wrote: > > > > > > &

Re: [PATCH 1/3] drm/sched: add drm_sched_prealloc_dependency_slots v3

2025-05-16 Thread Philipp Stanner
On Fri, 2025-05-16 at 15:30 +0100, Tvrtko Ursulin wrote: > > On 16/05/2025 14:38, Philipp Stanner wrote: > > On Fri, 2025-05-16 at 13:10 +0100, Tvrtko Ursulin wrote: > > > > > > On 16/05/2025 12:53, Tvrtko Ursulin wrote: > > > > > > > > On

Re: [PATCH v9 09/10] drm/doc: document some tracepoints as uAPI

2025-05-14 Thread Philipp Stanner
On Thu, 2025-04-24 at 10:38 +0200, Pierre-Eric Pelloux-Prayer wrote: > This commit adds a document section in drm-uapi.rst about > tracepoints, > and mark the events gpu_scheduler_trace.h as stable uAPI. > > The goal is to explicitly state that tools can rely on the fields, > formats and semantics

Re: [PATCH v2 6/6] drm/sched: Port unit tests to new cleanup design

2025-05-14 Thread Philipp Stanner
On Wed, 2025-05-14 at 09:30 +0100, Tvrtko Ursulin wrote: > > On 12/05/2025 09:00, Philipp Stanner wrote: > > On Thu, 2025-05-08 at 13:51 +0100, Tvrtko Ursulin wrote: > > > > > > Hi Philipp, > > > > > > On 08/05/2025 12:03, Philipp Stanner

Re: [PATCH v9 02/10] drm/sched: store the drm client_id in drm_sched_fence

2025-05-14 Thread Philipp Stanner
On Thu, 2025-04-24 at 10:38 +0200, Pierre-Eric Pelloux-Prayer wrote: > This will be used in a later commit to trace the drm client_id in > some of the gpu_scheduler trace events. > > This requires changing all the users of drm_sched_job_init to > add an extra parameter. > > The newly added drm_cl

Re: [PATCH v9 05/10] drm/sched: trace dependencies for gpu jobs

2025-05-14 Thread Philipp Stanner
nit: title: s/gpu/GPU We also mostly start with an upper case letter after the :, but JFYI, it's not a big deal. P. On Thu, 2025-04-24 at 10:38 +0200, Pierre-Eric Pelloux-Prayer wrote: > We can't trace dependencies from drm_sched_job_add_dependency > because when it's called the job's fence is

Re: [PATCH v9 08/10] drm: get rid of drm_sched_job::id

2025-05-14 Thread Philipp Stanner
On Thu, 2025-04-24 at 10:38 +0200, Pierre-Eric Pelloux-Prayer wrote: > Its only purpose was for trace events, but jobs can already be > uniquely identified using their fence. > > The downside of using the fence is that it's only available > after 'drm_sched_job_arm' was called which is true for al

[PATCH v3] drm/vmwgfx: Use non-hybrid PCI devres API

2025-05-14 Thread Philipp Stanner
-managed pcim_request_all_regions(). Signed-off-by: Philipp Stanner Reviewed-by: Zack Rusin --- Changes in v3: - Use the correct driver name in the commit message. (Zack) Changes in v2: - Fix unused variable error. --- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 14 +++--- 1 file changed, 3

[PATCH] drm/sched/tests: Use one lock for fence context

2025-05-21 Thread Philipp Stanner
dma_fence rules, e.g., ensuring that only one fence gets signaled at a time. Use the fence context (scheduler) lock for the jobs. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/scheduler/tests/mock_scheduler.c | 5 ++--- drivers/gpu/drm/scheduler/tests/sched_tests.h| 1 - 2 files changed

Re: [PATCH 1/3] drm/sched: add drm_sched_prealloc_dependency_slots v3

2025-05-21 Thread Philipp Stanner
On Tue, 2025-05-20 at 17:15 +0100, Tvrtko Ursulin wrote: > > On 19/05/2025 10:04, Philipp Stanner wrote: > > On Mon, 2025-05-19 at 09:51 +0100, Tvrtko Ursulin wrote: > > > > > > On 16/05/2025 18:16, Philipp Stanner wrote: > > > > On Fri, 2025-

Re: [PATCH 1/3] drm/sched: add drm_sched_prealloc_dependency_slots v3

2025-05-19 Thread Philipp Stanner
On Mon, 2025-05-19 at 09:51 +0100, Tvrtko Ursulin wrote: > > On 16/05/2025 18:16, Philipp Stanner wrote: > > On Fri, 2025-05-16 at 15:30 +0100, Tvrtko Ursulin wrote: > > > > > > On 16/05/2025 14:38, Philipp Stanner wrote: > > > > On Fri, 2025-

Re: [PATCH v9 02/10] drm/sched: store the drm client_id in drm_sched_fence

2025-05-19 Thread Philipp Stanner
On Mon, 2025-05-19 at 13:02 +0200, Pierre-Eric Pelloux-Prayer wrote: > > > Le 15/05/2025 à 08:53, Pierre-Eric Pelloux-Prayer a écrit : > > Hi, > > > > Le 14/05/2025 à 14:44, Philipp Stanner a écrit : > > > On Thu, 2025-04-24 at 10:38 +0200, Pierre-Eric Pell

Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency

2025-05-26 Thread Philipp Stanner
On Mon, 2025-05-26 at 11:25 +0200, Christian König wrote: > On 5/23/25 16:16, Danilo Krummrich wrote: > > On Fri, May 23, 2025 at 04:11:39PM +0200, Danilo Krummrich wrote: > > > On Fri, May 23, 2025 at 02:56:40PM +0200, Christian König wrote: > > > > It turned out that we can actually massively opt

[PATCH v2] drm/sched/tests: Use one lock for fence context

2025-05-27 Thread Philipp Stanner
scheduler lock. Signed-off-by: Philipp Stanner --- Changes in v2: - Make commit message more neutral by stating it's about simplifying the code. (Tvrtko) --- drivers/gpu/drm/scheduler/tests/mock_scheduler.c | 5 ++--- drivers/gpu/drm/scheduler/tests/sched_tests.h| 1 - 2 files change

Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency

2025-05-26 Thread Philipp Stanner
On Fri, 2025-05-23 at 14:56 +0200, Christian König wrote: > It turned out that we can actually massively optimize here. > > The previous code was horrible inefficient since it constantly > released > and re-acquired the lock of the xarray and started each iteration > from the > base of the array t

Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency

2025-05-26 Thread Philipp Stanner
On Mon, 2025-05-26 at 13:16 +0200, Christian König wrote: > On 5/26/25 11:34, Philipp Stanner wrote: > > On Mon, 2025-05-26 at 11:25 +0200, Christian König wrote: > > > On 5/23/25 16:16, Danilo Krummrich wrote: > > > > On Fri, May 23, 2025 at 04:11:39PM +0200,

Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency a bit

2025-05-26 Thread Philipp Stanner
+Cc Matthew, again :) On Thu, 2025-05-22 at 18:19 +0200, Christian König wrote: > On 5/22/25 16:27, Tvrtko Ursulin wrote: > > > > On 22/05/2025 14:41, Christian König wrote: > > > Since we already iterated over the xarray we know at which index > > > the new > > > entry should be stored. So inste

[PATCH] drm/sched: Discourage usage of separate workqueues

2025-06-04 Thread Philipp Stanner
n the documentation. Suggested-by: Danilo Krummrich Signed-off-by: Philipp Stanner --- include/drm/gpu_scheduler.h | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 81dcbfc8c223..11740d745223 100644 --- a/includ

[RFC PATCH 4/6] drm/nouveau: Make fence container helper usable driver-wide

2025-06-03 Thread Philipp Stanner
: Philipp Stanner --- drivers/gpu/drm/nouveau/nouveau_fence.c | 20 +++- drivers/gpu/drm/nouveau/nouveau_fence.h | 6 ++ 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index

[RFC PATCH 5/6] drm/nouveau: Add new callback for scheduler teardown

2025-06-03 Thread Philipp Stanner
There is a new callback for always tearing the scheduler down in a leak-free, deadlock-free manner. Port Nouveau as its first user by providing the scheduler with a callback that ensures the fence context gets killed in drm_sched_fini(). Signed-off-by: Philipp Stanner --- drivers/gpu/drm

[RFC PATCH 0/6] drm/sched: Avoid memory leaks by canceling job-by-job

2025-06-03 Thread Philipp Stanner
ps://lore.kernel.org/dri-devel/20250418113211.69956-1-tvrtko.ursu...@igalia.com/ Philipp Stanner (6): drm/sched: Avoid memory leaks with cancel_job() callback drm/sched/tests: Implement cancel_job() drm/sched: Warn if pending list is not empty drm/nouveau: Make fence container helper usable driver-wide

[RFC PATCH 2/6] drm/sched/tests: Implement cancel_job()

2025-06-03 Thread Philipp Stanner
hardware fence. That should be repaired and cleaned up, but it's probably better to do that in a separate series. Signed-off-by: Philipp Stanner --- .../gpu/drm/scheduler/tests/mock_scheduler.c | 71 +++ drivers/gpu/drm/scheduler/tests/sched_tests.h | 4 +- 2 files change

Re: [PATCH v11 00/10] Improve gpu_scheduler trace events + UAPI

2025-05-28 Thread Philipp Stanner
On Mon, 2025-05-26 at 14:54 +0200, Pierre-Eric Pelloux-Prayer wrote: > Hi, > > The initial goal of this series was to improve the drm and amdgpu > trace events to be able to expose more of the inner workings of > the scheduler and drivers to developers via tools. > > Then, the series evolved to b

Re: [RFC PATCH 1/6] drm/sched: Avoid memory leaks with cancel_job() callback

2025-06-12 Thread Philipp Stanner
On Thu, 2025-06-12 at 15:17 +0100, Tvrtko Ursulin wrote: > > On 03/06/2025 10:31, Philipp Stanner wrote: > > Since its inception, the GPU scheduler can leak memory if the > > driver > > calls drm_sched_fini() while there are still jobs in flight. > > > >

[PATCH v2] drm/sched: Clarify scenarios for separate workqueues

2025-06-12 Thread Philipp Stanner
about pitfalls. Co-authored-by: Danilo Krummrich Signed-off-by: Philipp Stanner --- Changes in v2: - Add new docu section for concurrency in the scheduler. (Sima) - Document what an ordered workqueue passed to the scheduler can be useful for. (Christian, Sima) - Warn more detailed about pote

Re: [PATCH] drm/amdgpu: fix task hang from failed job submission during process kill

2025-08-12 Thread Philipp Stanner
On Tue, 2025-08-12 at 08:58 +0200, Christian König wrote: > On 12.08.25 08:37, Liu01, Tong (Esther) wrote: > > [AMD Official Use Only - AMD Internal Distribution Only] > > > > Hi Christian, > > > > If a job is submitted into a stopped entity, in addition to an error log, > > it will also cause t

[PATCH v2] drm/sched: Document race condition in drm_sched_fini()

2025-08-13 Thread Philipp Stanner
associated with a scheduler must be torn down first. Then, however, the locking should be removed from drm_sched_fini() alltogether with an appropriate comment. Reported-by: James Flowers Link: https://lore.kernel.org/dri-devel/20250720235748.2798-1-bold.zone2...@fastmail.com/ Signed-off-by: Philipp

Re: [PATCH] drm/sched: Extend and update documentation

2025-08-11 Thread Philipp Stanner
On Thu, 2025-08-07 at 16:15 +0200, Christian König wrote: > On 05.08.25 12:22, Philipp Stanner wrote: > > On Tue, 2025-08-05 at 11:05 +0200, Christian König wrote: > > > On 24.07.25 17:07, Philipp Stanner wrote: > > > > > +/** > >

[PATCH] MAINTAINERS: Add website of Nova GPU driver

2025-08-07 Thread Philipp Stanner
The Nova GPU driver has a sub-website on the Rust-for-Linux website which so far was missing from the respective section in MAINTAINERS. Add the Nova website. Signed-off-by: Philipp Stanner --- MAINTAINERS | 2 ++ 1 file changed, 2 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index

Re: [PATCH] drm/sched: Prevent stopped entities from being added to the run queue.

2025-08-14 Thread Philipp Stanner
On Thu, 2025-08-14 at 12:45 +0100, Tvrtko Ursulin wrote: > > On 14/08/2025 11:42, Tvrtko Ursulin wrote: > > > > On 21/07/2025 08:52, Philipp Stanner wrote: > > > +Cc Tvrtko, who's currently reworking FIFO and RR. > > > > > > On Sun, 2025-07-20

Re: [PATCH] drm/amdgpu: fix task hang from failed job submission during process kill

2025-08-11 Thread Philipp Stanner
Hi, title: this patch changes nothing in amdgpu. Thus, the prefix must be drm/sched: Fix […] Furthermore, please use scripts/get_maintainer. A few relevant folks are missing. +Cc Danilo, Matthew On Mon, 2025-08-11 at 15:20 +0800, Liu01 Tong wrote: > During process kill, drm_sched_entity_flush

Re: [PATCH] drm/amdgpu: fix task hang from failed job submission during process kill

2025-08-11 Thread Philipp Stanner
On Mon, 2025-08-11 at 10:18 +0200, Philipp Stanner wrote: > Hi, > > title: this patch changes nothing in amdgpu. > > Thus, the prefix must be drm/sched: Fix […] > > > Furthermore, please use scripts/get_maintainer. A few relevant folks > are missing. +Cc Danilo, Ma

Re: [PATCH 2/2] dma-buf: add warning when dma_fence is signaled from IOCTL

2025-08-13 Thread Philipp Stanner
On Tue, 2025-08-12 at 16:34 +0200, Christian König wrote: > From: Christian König Is this the correct mail addr? :) > > We have the re-occurring problem that people try to invent a > DMA-fences implementation which signals fences based on an userspace > IOCTL. > > This is well known as source

[PATCH v2] Revert "drm/nouveau: Remove waitque for sched teardown"

2025-09-01 Thread Philipp Stanner
ove waitque for sched teardown") Suggested-by: Danilo Krummrich Signed-off-by: Philipp Stanner --- Changes in v2: - Don't revert commit 89b2675198ab ("drm/nouveau: Make fence container helper usable driver-wide") - Add Fixes-tag --- drivers/gpu/drm/nouveau/nouveau_fence.c | 15 --

Re: [PATCH v1 2/2] drm/sched: limit sched score update to jobs change

2025-09-02 Thread Philipp Stanner
On Mon, 2025-09-01 at 15:14 +0200, Pierre-Eric Pelloux-Prayer wrote: > > > Le 25/08/2025 à 15:13, Philipp Stanner a écrit : > > On Fri, 2025-08-22 at 15:43 +0200, Pierre-Eric Pelloux-Prayer wrote: > > > Currently, the scheduler score is incremented when a job is pushe

Re: [PATCH 2/2] dma-buf: add warning when dma_fence is signaled from IOCTL

2025-09-04 Thread Philipp Stanner
On Tue, 2025-08-12 at 16:34 +0200, Christian König wrote: > From: Christian König > > We have the re-occurring problem that people try to invent a > DMA-fences implementation which signals fences based on an userspace > IOCTL. > > This is well known as source of hard to track down crashes and is

Re: [PATCH v2] Revert "drm/nouveau: Remove waitque for sched teardown"

2025-09-04 Thread Philipp Stanner
On Thu, 2025-09-04 at 13:56 +0200, Christian König wrote: > On 04.09.25 13:12, Philipp Stanner wrote: > > On Thu, 2025-09-04 at 12:27 +0200, Christian König wrote: > > > On 01.09.25 10:31, Philipp Stanner wrote: > > > > This reverts: > > > > > &g

Re: [PATCH v2] Revert "drm/nouveau: Remove waitque for sched teardown"

2025-09-04 Thread Philipp Stanner
On Thu, 2025-09-04 at 12:27 +0200, Christian König wrote: > On 01.09.25 10:31, Philipp Stanner wrote: > > This reverts: > > > > commit bead88002227 ("drm/nouveau: Remove waitque for sched teardown") > > commit 5f46f5c7af8c ("drm/nouveau: Add new callback

Re: [RFC v8 04/12] drm/sched: Consolidate entity run queue management

2025-09-11 Thread Philipp Stanner
patches or could it be branched out? P. > > Signed-off-by: Tvrtko Ursulin > Cc: Christian König > Cc: Danilo Krummrich > Cc: Matthew Brost > Cc: Philipp Stanner > --- >  drivers/gpu/drm/scheduler/sched_entity.c   | 64 ++- >  drivers/gpu/drm/scheduler/s

Re: [RFC v8 08/12] drm/sched: Remove idle entity from tree

2025-09-11 Thread Philipp Stanner
work. Or could it be made generic for the current in-tree scheduler? > > Apart from that, the upcoming fair scheduling algorithm will rely on the > tree only containing runnable entities. > > Signed-off-by: Tvrtko Ursulin > Cc: Christian König > Cc: Danilo Krummrich >

<    2   3   4   5   6   7   8   >