On Fri, 2025-07-18 at 10:35 +0100, Tvrtko Ursulin wrote:
>
> On 18/07/2025 10:31, Philipp Stanner wrote:
> > On Fri, 2025-07-18 at 08:13 +0100, Tvrtko Ursulin wrote:
> > >
> > > On 16/07/2025 21:44, Maíra Canal wrote:
> > > > Hi Tvrtko,
> > >
gt; > > > in the
> > > > > > > queue we can simply add the signaled check and have it return the
> > > > > > > presence
> > > > > > > of more jobs to be freed to the caller. That way the work item
> > > > > > &g
On Mon, 2025-07-21 at 11:07 -0700, Matthew Brost wrote:
> On Mon, Jul 21, 2025 at 12:14:31PM +0200, Danilo Krummrich wrote:
> > On Mon Jul 21, 2025 at 10:16 AM CEST, Philipp Stanner wrote:
> > > On Mon, 2025-07-21 at 09:52 +0200, Philipp Stanner wrote:
> > > > On
ssue simply that the
fence might be dropped unsignaled, being a bug by definition? Needs to
be written down.
Grammar is also a bit too broken.
And running the unit tests before pushing is probably also a good idea.
> >
> > Signed-off-by: Lin.Cao
Acked-by: Philipp Stanner
>
> Revie
Hello,
On Wed, 2025-05-14 at 09:59 -0700, Rob Clark wrote:
> From: Rob Clark
>
> Similar to the existing credit limit mechanism, but applying to jobs
> enqueued to the scheduler but not yet run.
>
> The use case is to put an upper bound on preallocated, and
> potentially
> unneeded, pgtable pag
On Thu, 2025-05-15 at 17:17 +0100, Tvrtko Ursulin wrote:
>
> On 15/05/2025 16:00, Christian König wrote:
> > Sometimes drivers need to be able to submit multiple jobs which
> > depend on
> > each other to different schedulers at the same time, but using
> > drm_sched_job_add_dependency() can't fai
that will never be resolved. Fix this issue by ensuring
> that
> scheduled fences are properly signaled when an entity is killed,
> allowing
> dependent applications to continue execution.
That sounds perfect, yes, Thx.
Reviewed-by: Philipp Stanner
P.
>
> Thanks,
>
On Thu, 2025-05-22 at 14:37 +0100, Tvrtko Ursulin wrote:
>
> On 22/05/2025 09:27, Philipp Stanner wrote:
> > From: Philipp Stanner
> >
> > The GPU scheduler currently does not ensure that its pending_list
> > is
> > empty before performing various other
On Thu, 2025-05-22 at 15:06 +0100, Tvrtko Ursulin wrote:
>
> On 22/05/2025 09:27, Philipp Stanner wrote:
> > The drm_gpu_scheduler now supports a callback to help
> > drm_sched_fini()
> > avoid memory leaks. This callback instructs the driver to signal
> > a
On Thu, 2025-05-22 at 15:09 +0200, Christian König wrote:
> On 5/22/25 14:59, Danilo Krummrich wrote:
> > On Thu, May 22, 2025 at 02:34:33PM +0200, Christian König wrote:
> > > See all the functions inside include/linux/dma-fence.h can be
> > > used by everybody. It's basically the public interface
On Thu, 2025-05-22 at 15:24 +0200, Christian König wrote:
> On 5/22/25 15:16, Philipp Stanner wrote:
> > On Thu, 2025-05-22 at 15:09 +0200, Christian König wrote:
> > > On 5/22/25 14:59, Danilo Krummrich wrote:
> > > > On Thu, May 22, 2025 at 02:34:33PM +0200,
On Thu, 2025-05-22 at 14:34 +0200, Christian König wrote:
> On 5/22/25 14:20, Philipp Stanner wrote:
> > On Thu, 2025-05-22 at 14:06 +0200, Christian König wrote:
> > > On 5/22/25 13:25, Philipp Stanner wrote:
> > > > dma_fence_is_signa
On Wed, 2025-05-21 at 11:24 +0100, Tvrtko Ursulin wrote:
>
> On 21/05/2025 11:04, Philipp Stanner wrote:
> > When the unit tests were implemented, each scheduler job got its
> > own,
> > distinct lock. This is not how dma_fence context locking rules are
> > t
From: Philipp Stanner
The GPU scheduler currently does not ensure that its pending_list is
empty before performing various other teardown tasks in
drm_sched_fini().
If there are still jobs in the pending_list, this is problematic because
after scheduler teardown, no one will call
a new error
field for the fence error.
Keep the job status as DRM_MOCK_SCHED_JOB_DONE for now, since there is
no party for which checking for a CANCELED status would be useful
currently.
Signed-off-by: Philipp Stanner
---
.../gpu/drm/scheduler/tests/mock_scheduler.c | 67
drm_sched_fini() can leak jobs under certain circumstances.
Warn if that happens.
Signed-off-by: Philipp Stanner
---
drivers/gpu/drm/scheduler/sched_main.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
nouveau_sched_fence_context_kill() the waitque is not necessary anymore.
Remove the waitque.
Signed-off-by: Philipp Stanner
---
drivers/gpu/drm/nouveau/nouveau_sched.c | 20 +++-
drivers/gpu/drm/nouveau/nouveau_sched.h | 9 +++--
drivers/gpu/drm/nouveau/nouveau_uvmm.c | 8
3 files
ovide users with a more
reliable, clean scheduler API.
Philipp
Philipp Stanner (5):
drm/sched: Fix teardown leaks with waitqueue
drm/sched/tests: Port tests to new cleanup method
drm/sched: Warn if pending list is not empty
drm/nouveau: Add new callback for scheduler teardown
drm/nouveau: Remove
There is a new callback for always tearing the scheduler down in a
leak-free, deadlock-free manner.
Port Nouveau as its first user by providing the scheduler with a
callback that ensures the fence context gets killed in drm_sched_fini().
Signed-off-by: Philipp Stanner
---
drivers/gpu/drm
ed. Use it internally.
Suggested-by: Tvrtko Ursulin
Signed-off-by: Philipp Stanner
---
include/linux/dma-fence.h | 24 ++--
1 file changed, 22 insertions(+), 2 deletions(-)
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 48b5202c531d..ac951a54a007 10
which only checks, never signals.
Signed-off-by: Philipp Stanner
---
drivers/gpu/drm/nouveau/nouveau_fence.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c
b/drivers/gpu/drm/nouveau/nouveau_fence.c
index d5654e26d5bc..993b3dcb5db0
On Thu, 2025-05-22 at 14:06 +0200, Christian König wrote:
> On 5/22/25 13:25, Philipp Stanner wrote:
> > dma_fence_is_signaled_locked(), which is used in
> > nouveau_fence_context_kill(), can signal fences below the surface
> > through a callback.
> >
> > The
On Tue, 2025-05-27 at 12:10 +0200, Philipp Stanner wrote:
> There is no need for separate locks for single jobs and the entire
> scheduler. The dma_fence context can be protected by the scheduler
> lock,
> allowing for removing the jobs' locks. This simplifies things and
> re
I'd call that patch sth like "Make timeout unit tests faster". Makes
more obvious what it's about.
P.
On Fri, 2025-05-30 at 11:01 -0300, Maíra Canal wrote:
> As more KUnit tests are introduced to evaluate the basic capabilities
> of
> the `timedout_job()` hook, the test suite will continue to inc
On Mon, 2025-06-02 at 08:36 -0300, Maíra Canal wrote:
> Hi Philipp,
>
> On 02/06/25 04:28, Philipp Stanner wrote:
> > On Fri, 2025-05-30 at 11:01 -0300, Maíra Canal wrote:
>
> [...]
>
> > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > >
es: 704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is
> still active")
Could also contain a "Closes: " with the link to the appropriate
message from thread [1] from below.
You might also include "Reported-by: Philipp" since I technically first
describ
Hi,
thx for the update. Seems to be developing nicely. Some comments below.
On Fri, 2025-05-30 at 11:01 -0300, Maíra Canal wrote:
> When the DRM scheduler times out, it's possible that the GPU isn't
> hung;
> instead, a job may still be running, and there may be no valid reason
> to
> reset the h
On Fri, 2025-05-30 at 11:01 -0300, Maíra Canal wrote:
> When a CL/CSD job times out, we check if the GPU has made any
> progress
> since the last timeout. If so, instead of resetting the hardware, we
> skip
> the reset and allow the timer to be rearmed. This gives long-running
> jobs
> a chance to
On Fri, 2025-05-30 at 11:01 -0300, Maíra Canal wrote:
> Etnaviv can skip a hardware reset in two situations:
>
> 1. TDR has fired before the free-job worker and the timeout is
> spurious.
> 2. The GPU is still making progress on the front-end and we can
> give
> the job a chance to comple
On Fri, 2025-05-30 at 11:01 -0300, Maíra Canal wrote:
> Xe can skip the reset if TDR has fired before the free job worker and
> can
> also re-arm the timeout timer in some scenarios. Instead of using the
> scheduler internals to add the job to the pending list, use the
> DRM_GPU_SCHED_STAT_NO_HANG
On Tue, 2025-06-03 at 13:27 +0100, Tvrtko Ursulin wrote:
>
> On 03/06/2025 10:31, Philipp Stanner wrote:
> > An alternative version to [1], based on Tvrtko's suggestion from
> > [2].
> >
> > I tested this for Nouveau. Works.
> >
> > I'm having
the hardware fence associated with the
job. Afterwards, the scheduler can savely use the established free_job()
callback for freeing the job.
Implement the new backend_ops callback cancel_job().
Suggested-by: Tvrtko Ursulin
Signed-off-by: Philipp Stanner
---
drivers/gpu/drm/scheduler
drm_sched_fini() can leak jobs under certain circumstances.
Warn if that happens.
Signed-off-by: Philipp Stanner
---
drivers/gpu/drm/scheduler/sched_main.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
nouveau_sched_fence_context_kill() the waitque is not necessary anymore.
Remove the waitque.
Signed-off-by: Philipp Stanner
---
drivers/gpu/drm/nouveau/nouveau_sched.c | 20 +++-
drivers/gpu/drm/nouveau/nouveau_sched.h | 9 +++--
drivers/gpu/drm/nouveau/nouveau_uvmm.c | 8
3 files
On Wed, 2025-06-04 at 17:07 +0200, Simona Vetter wrote:
> On Wed, Jun 04, 2025 at 11:41:25AM +0200, Christian König wrote:
> > On 6/4/25 10:16, Philipp Stanner wrote:
> > > struct drm_sched_init_args provides the possibility of letting
> > > the
> > > sche
On Thu, 2025-06-05 at 15:41 +0200, Philipp Stanner wrote:
> Since the drm_mock_scheduler does not have real users in userspace,
> nor
> does it have real hardware or firmware rings, it's not necessary to
> signal timedout fences nor free jobs - from a functional standpoint.
&g
On Wed, 2025-06-18 at 11:47 -0300, Maíra Canal wrote:
> Xe can skip the reset if TDR has fired before the free job worker and
> can
> also re-arm the timeout timer in some scenarios. Instead of
> manipulating
> scheduler's internals, inform the scheduler that the job did not
> actually
> timeout an
On Mon, 2025-06-16 at 10:27 +0100, Tvrtko Ursulin wrote:
>
> On 12/06/2025 15:20, Philipp Stanner wrote:
> > On Thu, 2025-06-12 at 15:17 +0100, Tvrtko Ursulin wrote:
> > >
> > > On 03/06/2025 10:31, Philipp Stanner wrote:
> > > > Since its inception
On Mon, 2025-06-16 at 09:49 -0300, Maíra Canal wrote:
> Hi Danilo,
>
> On 16/06/25 08:14, Danilo Krummrich wrote:
> > On Mon, Jun 16, 2025 at 11:57:47AM +0100, Tvrtko Ursulin wrote:
> > > Code looks fine, but currently nothing is broken and I disagree
> > > with the
> > > goal that the _mock_^1 co
r new scheduler users. Therefore, they should approximate the
canonical usage as much as possible.
Make sure timed out hardware fences get signaled with the appropriate
error code.
Signed-off-by: Philipp Stanner
---
.../gpu/drm/scheduler/tests/mock_scheduler.c | 26 ++-
1
On Fri, 2025-06-13 at 10:23 +0200, Christian König wrote:
> On 6/13/25 01:48, Danilo Krummrich wrote:
> > On Thu, Jun 12, 2025 at 09:00:34AM +0200, Christian König wrote:
> > > On 6/11/25 17:11, Danilo Krummrich wrote:
> > > > > > > Mhm, reiterating our internal discussion on the mailing
> > > > >
Hello,
On Tue, 2025-07-22 at 13:05 -0700, James wrote:
> On Mon, Jul 21, 2025, at 1:16 AM, Philipp Stanner wrote:
> > On Mon, 2025-07-21 at 09:52 +0200, Philipp Stanner wrote:
> > > +Cc Tvrtko, who's currently reworking FIFO and RR.
> > >
> > > On Sun,
+Cc Tvrtko, who's currently reworking FIFO and RR.
On Sun, 2025-07-20 at 16:56 -0700, James Flowers wrote:
> Fixes an issue where entities are added to the run queue in
> drm_sched_rq_update_fifo_locked after being killed, causing a
> slab-use-after-free error.
>
> Signed-off-by: James Flowers
>
On Mon, 2025-07-21 at 09:52 +0200, Philipp Stanner wrote:
> +Cc Tvrtko, who's currently reworking FIFO and RR.
>
> On Sun, 2025-07-20 at 16:56 -0700, James Flowers wrote:
> > Fixes an issue where entities are added to the run queue in
> > drm_sched_rq_update_fifo
From: Philipp Stanner
The various objects and their memory lifetime used by the GPU scheduler
are currently not fully documented.
Add documentation describing the scheduler's objects. Improve the
general documentation at a few other places.
Co-developed-by: Christian König
Signed-o
Two comments from myself to open up room for discussion:
On Thu, 2025-07-24 at 16:01 +0200, Philipp Stanner wrote:
> From: Philipp Stanner
>
> The various objects and their memory lifetime used by the GPU scheduler
> are currently not fully documented.
>
> Add documentat
On Fri, 2025-08-01 at 15:42 +, Timur Tabi wrote:
> On Fri, 2025-08-01 at 17:12 +0200, Danilo Krummrich wrote:
> > On Fri Aug 1, 2025 at 4:50 PM CEST, Timur Tabi wrote:
> > > Does mean that the TODO has been done, or that someone completely forgot
> > > and now your patch
> > > is
> > > remove
On Tue, 2025-08-05 at 11:05 +0200, Christian König wrote:
> On 24.07.25 17:07, Philipp Stanner wrote:
> > > +/**
> > > + * DOC: Scheduler Fence Object
> > > + *
> > > + * The scheduler fence object (&struct drm_sched_fence) encapsulates the
> > >
struct nouveau_channel contains the member 'accel_done' and a forgotten
TODO which hints at that mechanism being removed in the "near future".
Since that variable is read nowhere anymore, this "near future" is now.
Remove the variable and the TODO.
Signed-off-by:
associated with a scheduler must be torn down first. Then,
however, the locking should be removed from drm_sched_fini() alltogether
with an appropriate comment.
Reported-by: James Flowers
Link:
https://lore.kernel.org/dri-devel/20250720235748.2798-1-bold.zone2...@fastmail.com/
Signed-off-by: Philipp
On Tue, 2025-07-22 at 01:45 -0700, Matthew Brost wrote:
> On Tue, Jul 22, 2025 at 01:07:29AM -0700, Matthew Brost wrote:
> > On Tue, Jul 22, 2025 at 09:37:11AM +0200, Philipp Stanner wrote:
> > > On Mon, 2025-07-21 at 11:07 -0700, Matthew Brost wrote:
> > > > On M
gt; > loosely called random. Under the assumption it will not always be the
> > > same
> > > entity which is re-joining the queue under these circumstances.
> > >
> > > Another way to look at this is that it is adding a little bit of limited
> > > random
On Sat, 2025-05-03 at 17:59 -0300, Maíra Canal wrote:
> When the DRM scheduler times out, it's possible that the GPU isn't
> hung;
> instead, a job may still be running, and there may be no valid reason
> to
> reset the hardware. This can occur in two situations:
>
> 1. The GPU exposes some mech
On Thu, 2025-05-08 at 12:44 +0200, Javier Martinez Canillas wrote:
> Philipp Stanner writes:
>
> Hello Philipp,
>
> > On Tue, 2025-04-22 at 23:51 +0200, Javier Martinez Canillas wrote:
> > > Philipp Stanner writes:
> > >
> > > Hello Philipp,
>
On Thu, 2025-05-08 at 11:39 -0400, Zack Rusin wrote:
> On Thu, May 8, 2025 at 6:40 AM Philipp Stanner
> wrote:
> >
> > On Wed, 2025-04-23 at 14:06 +0200, Philipp Stanner wrote:
> > > vmgfx enables its PCI device with pcim_enable_device(). This,
> > &g
On Tue, 2025-05-06 at 07:32 -0700, Matthew Brost wrote:
> On Mon, May 05, 2025 at 07:41:09PM -0700, Matthew Brost wrote:
> > On Sat, May 03, 2025 at 05:59:52PM -0300, Maíra Canal wrote:
> > > When the DRM scheduler times out, it's possible that the GPU
> > > isn't hung;
> > > instead, a job may sti
heduling policy, not general other improvements.
P.
>
> Signed-off-by: Tvrtko Ursulin
> Cc: Christian König
> Cc: Danilo Krummrich
> Cc: Matthew Brost
> Cc: Philipp Stanner
> ---
> drivers/gpu/drm/scheduler/sched_main.c | 39 +++-
> --
> 1
he function.
Same here, that's a good candidate for a separate patch / series.
P.
>
> Signed-off-by: Tvrtko Ursulin
> Cc: Christian König
> Cc: Danilo Krummrich
> Cc: Matthew Brost
> Cc: Philipp Stanner
> ---
> drivers/gpu/drm/scheduler/sched_main.c | 37 +++
gt; completed jobs as soon as possible so the metric is most up to date
> when
> view from the submission side of things.
>
> Signed-off-by: Tvrtko Ursulin
> Cc: Christian König
> Cc: Danilo Krummrich
> Cc: Matthew Brost
> Cc: Philipp Stanner
> ---
&
On Fri, 2025-05-16 at 10:33 +0100, Tvrtko Ursulin wrote:
>
> On 24/04/2025 10:55, Philipp Stanner wrote:
> > The waitqueue that ensures that drm_sched_fini() blocks until the
> > pending_list has become empty could theoretically cause that
> > function to
> > bl
On Fri, 2025-05-16 at 13:10 +0100, Tvrtko Ursulin wrote:
>
> On 16/05/2025 12:53, Tvrtko Ursulin wrote:
> >
> > On 16/05/2025 08:28, Philipp Stanner wrote:
> > > On Thu, 2025-05-15 at 17:17 +0100, Tvrtko Ursulin wrote:
> > > >
> > &
On Fri, 2025-05-16 at 15:30 +0100, Tvrtko Ursulin wrote:
>
> On 16/05/2025 14:38, Philipp Stanner wrote:
> > On Fri, 2025-05-16 at 13:10 +0100, Tvrtko Ursulin wrote:
> > >
> > > On 16/05/2025 12:53, Tvrtko Ursulin wrote:
> > > >
> > > > On
On Thu, 2025-04-24 at 10:38 +0200, Pierre-Eric Pelloux-Prayer wrote:
> This commit adds a document section in drm-uapi.rst about
> tracepoints,
> and mark the events gpu_scheduler_trace.h as stable uAPI.
>
> The goal is to explicitly state that tools can rely on the fields,
> formats and semantics
On Wed, 2025-05-14 at 09:30 +0100, Tvrtko Ursulin wrote:
>
> On 12/05/2025 09:00, Philipp Stanner wrote:
> > On Thu, 2025-05-08 at 13:51 +0100, Tvrtko Ursulin wrote:
> > >
> > > Hi Philipp,
> > >
> > > On 08/05/2025 12:03, Philipp Stanner
On Thu, 2025-04-24 at 10:38 +0200, Pierre-Eric Pelloux-Prayer wrote:
> This will be used in a later commit to trace the drm client_id in
> some of the gpu_scheduler trace events.
>
> This requires changing all the users of drm_sched_job_init to
> add an extra parameter.
>
> The newly added drm_cl
nit: title: s/gpu/GPU
We also mostly start with an upper case letter after the :, but JFYI,
it's not a big deal.
P.
On Thu, 2025-04-24 at 10:38 +0200, Pierre-Eric Pelloux-Prayer wrote:
> We can't trace dependencies from drm_sched_job_add_dependency
> because when it's called the job's fence is
On Thu, 2025-04-24 at 10:38 +0200, Pierre-Eric Pelloux-Prayer wrote:
> Its only purpose was for trace events, but jobs can already be
> uniquely identified using their fence.
>
> The downside of using the fence is that it's only available
> after 'drm_sched_job_arm' was called which is true for al
-managed pcim_request_all_regions().
Signed-off-by: Philipp Stanner
Reviewed-by: Zack Rusin
---
Changes in v3:
- Use the correct driver name in the commit message. (Zack)
Changes in v2:
- Fix unused variable error.
---
drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 14 +++---
1 file changed, 3
dma_fence rules, e.g., ensuring that only one fence gets
signaled at a time.
Use the fence context (scheduler) lock for the jobs.
Signed-off-by: Philipp Stanner
---
drivers/gpu/drm/scheduler/tests/mock_scheduler.c | 5 ++---
drivers/gpu/drm/scheduler/tests/sched_tests.h| 1 -
2 files changed
On Tue, 2025-05-20 at 17:15 +0100, Tvrtko Ursulin wrote:
>
> On 19/05/2025 10:04, Philipp Stanner wrote:
> > On Mon, 2025-05-19 at 09:51 +0100, Tvrtko Ursulin wrote:
> > >
> > > On 16/05/2025 18:16, Philipp Stanner wrote:
> > > > On Fri, 2025-
On Mon, 2025-05-19 at 09:51 +0100, Tvrtko Ursulin wrote:
>
> On 16/05/2025 18:16, Philipp Stanner wrote:
> > On Fri, 2025-05-16 at 15:30 +0100, Tvrtko Ursulin wrote:
> > >
> > > On 16/05/2025 14:38, Philipp Stanner wrote:
> > > > On Fri, 2025-
On Mon, 2025-05-19 at 13:02 +0200, Pierre-Eric Pelloux-Prayer wrote:
>
>
> Le 15/05/2025 à 08:53, Pierre-Eric Pelloux-Prayer a écrit :
> > Hi,
> >
> > Le 14/05/2025 à 14:44, Philipp Stanner a écrit :
> > > On Thu, 2025-04-24 at 10:38 +0200, Pierre-Eric Pell
On Mon, 2025-05-26 at 11:25 +0200, Christian König wrote:
> On 5/23/25 16:16, Danilo Krummrich wrote:
> > On Fri, May 23, 2025 at 04:11:39PM +0200, Danilo Krummrich wrote:
> > > On Fri, May 23, 2025 at 02:56:40PM +0200, Christian König wrote:
> > > > It turned out that we can actually massively opt
scheduler lock.
Signed-off-by: Philipp Stanner
---
Changes in v2:
- Make commit message more neutral by stating it's about simplifying
the code. (Tvrtko)
---
drivers/gpu/drm/scheduler/tests/mock_scheduler.c | 5 ++---
drivers/gpu/drm/scheduler/tests/sched_tests.h| 1 -
2 files change
On Fri, 2025-05-23 at 14:56 +0200, Christian König wrote:
> It turned out that we can actually massively optimize here.
>
> The previous code was horrible inefficient since it constantly
> released
> and re-acquired the lock of the xarray and started each iteration
> from the
> base of the array t
On Mon, 2025-05-26 at 13:16 +0200, Christian König wrote:
> On 5/26/25 11:34, Philipp Stanner wrote:
> > On Mon, 2025-05-26 at 11:25 +0200, Christian König wrote:
> > > On 5/23/25 16:16, Danilo Krummrich wrote:
> > > > On Fri, May 23, 2025 at 04:11:39PM +0200,
+Cc Matthew, again :)
On Thu, 2025-05-22 at 18:19 +0200, Christian König wrote:
> On 5/22/25 16:27, Tvrtko Ursulin wrote:
> >
> > On 22/05/2025 14:41, Christian König wrote:
> > > Since we already iterated over the xarray we know at which index
> > > the new
> > > entry should be stored. So inste
n the documentation.
Suggested-by: Danilo Krummrich
Signed-off-by: Philipp Stanner
---
include/drm/gpu_scheduler.h | 7 +--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 81dcbfc8c223..11740d745223 100644
--- a/includ
: Philipp Stanner
---
drivers/gpu/drm/nouveau/nouveau_fence.c | 20 +++-
drivers/gpu/drm/nouveau/nouveau_fence.h | 6 ++
2 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c
b/drivers/gpu/drm/nouveau/nouveau_fence.c
index
There is a new callback for always tearing the scheduler down in a
leak-free, deadlock-free manner.
Port Nouveau as its first user by providing the scheduler with a
callback that ensures the fence context gets killed in drm_sched_fini().
Signed-off-by: Philipp Stanner
---
drivers/gpu/drm
ps://lore.kernel.org/dri-devel/20250418113211.69956-1-tvrtko.ursu...@igalia.com/
Philipp Stanner (6):
drm/sched: Avoid memory leaks with cancel_job() callback
drm/sched/tests: Implement cancel_job()
drm/sched: Warn if pending list is not empty
drm/nouveau: Make fence container helper usable driver-wide
hardware fence.
That should be repaired and cleaned up, but it's probably better to do
that in a separate series.
Signed-off-by: Philipp Stanner
---
.../gpu/drm/scheduler/tests/mock_scheduler.c | 71 +++
drivers/gpu/drm/scheduler/tests/sched_tests.h | 4 +-
2 files change
On Mon, 2025-05-26 at 14:54 +0200, Pierre-Eric Pelloux-Prayer wrote:
> Hi,
>
> The initial goal of this series was to improve the drm and amdgpu
> trace events to be able to expose more of the inner workings of
> the scheduler and drivers to developers via tools.
>
> Then, the series evolved to b
On Thu, 2025-06-12 at 15:17 +0100, Tvrtko Ursulin wrote:
>
> On 03/06/2025 10:31, Philipp Stanner wrote:
> > Since its inception, the GPU scheduler can leak memory if the
> > driver
> > calls drm_sched_fini() while there are still jobs in flight.
> >
> >
about pitfalls.
Co-authored-by: Danilo Krummrich
Signed-off-by: Philipp Stanner
---
Changes in v2:
- Add new docu section for concurrency in the scheduler. (Sima)
- Document what an ordered workqueue passed to the scheduler can be
useful for. (Christian, Sima)
- Warn more detailed about pote
On Tue, 2025-08-12 at 08:58 +0200, Christian König wrote:
> On 12.08.25 08:37, Liu01, Tong (Esther) wrote:
> > [AMD Official Use Only - AMD Internal Distribution Only]
> >
> > Hi Christian,
> >
> > If a job is submitted into a stopped entity, in addition to an error log,
> > it will also cause t
associated with a scheduler must be torn down first. Then,
however, the locking should be removed from drm_sched_fini() alltogether
with an appropriate comment.
Reported-by: James Flowers
Link:
https://lore.kernel.org/dri-devel/20250720235748.2798-1-bold.zone2...@fastmail.com/
Signed-off-by: Philipp
On Thu, 2025-08-07 at 16:15 +0200, Christian König wrote:
> On 05.08.25 12:22, Philipp Stanner wrote:
> > On Tue, 2025-08-05 at 11:05 +0200, Christian König wrote:
> > > On 24.07.25 17:07, Philipp Stanner wrote:
> > > > > +/**
> >
The Nova GPU driver has a sub-website on the Rust-for-Linux website
which so far was missing from the respective section in MAINTAINERS.
Add the Nova website.
Signed-off-by: Philipp Stanner
---
MAINTAINERS | 2 ++
1 file changed, 2 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index
On Thu, 2025-08-14 at 12:45 +0100, Tvrtko Ursulin wrote:
>
> On 14/08/2025 11:42, Tvrtko Ursulin wrote:
> >
> > On 21/07/2025 08:52, Philipp Stanner wrote:
> > > +Cc Tvrtko, who's currently reworking FIFO and RR.
> > >
> > > On Sun, 2025-07-20
Hi,
title: this patch changes nothing in amdgpu.
Thus, the prefix must be drm/sched: Fix […]
Furthermore, please use scripts/get_maintainer. A few relevant folks
are missing. +Cc Danilo, Matthew
On Mon, 2025-08-11 at 15:20 +0800, Liu01 Tong wrote:
> During process kill, drm_sched_entity_flush
On Mon, 2025-08-11 at 10:18 +0200, Philipp Stanner wrote:
> Hi,
>
> title: this patch changes nothing in amdgpu.
>
> Thus, the prefix must be drm/sched: Fix […]
>
>
> Furthermore, please use scripts/get_maintainer. A few relevant folks
> are missing. +Cc Danilo, Ma
On Tue, 2025-08-12 at 16:34 +0200, Christian König wrote:
> From: Christian König
Is this the correct mail addr? :)
>
> We have the re-occurring problem that people try to invent a
> DMA-fences implementation which signals fences based on an userspace
> IOCTL.
>
> This is well known as source
ove waitque for sched teardown")
Suggested-by: Danilo Krummrich
Signed-off-by: Philipp Stanner
---
Changes in v2:
- Don't revert commit 89b2675198ab ("drm/nouveau: Make fence container helper
usable driver-wide")
- Add Fixes-tag
---
drivers/gpu/drm/nouveau/nouveau_fence.c | 15 --
On Mon, 2025-09-01 at 15:14 +0200, Pierre-Eric Pelloux-Prayer wrote:
>
>
> Le 25/08/2025 à 15:13, Philipp Stanner a écrit :
> > On Fri, 2025-08-22 at 15:43 +0200, Pierre-Eric Pelloux-Prayer wrote:
> > > Currently, the scheduler score is incremented when a job is pushe
On Tue, 2025-08-12 at 16:34 +0200, Christian König wrote:
> From: Christian König
>
> We have the re-occurring problem that people try to invent a
> DMA-fences implementation which signals fences based on an userspace
> IOCTL.
>
> This is well known as source of hard to track down crashes and is
On Thu, 2025-09-04 at 13:56 +0200, Christian König wrote:
> On 04.09.25 13:12, Philipp Stanner wrote:
> > On Thu, 2025-09-04 at 12:27 +0200, Christian König wrote:
> > > On 01.09.25 10:31, Philipp Stanner wrote:
> > > > This reverts:
> > > >
> &g
On Thu, 2025-09-04 at 12:27 +0200, Christian König wrote:
> On 01.09.25 10:31, Philipp Stanner wrote:
> > This reverts:
> >
> > commit bead88002227 ("drm/nouveau: Remove waitque for sched teardown")
> > commit 5f46f5c7af8c ("drm/nouveau: Add new callback
patches or could it be branched out?
P.
>
> Signed-off-by: Tvrtko Ursulin
> Cc: Christian König
> Cc: Danilo Krummrich
> Cc: Matthew Brost
> Cc: Philipp Stanner
> ---
> drivers/gpu/drm/scheduler/sched_entity.c | 64 ++-
> drivers/gpu/drm/scheduler/s
work. Or
could it be made generic for the current in-tree scheduler?
>
> Apart from that, the upcoming fair scheduling algorithm will rely on the
> tree only containing runnable entities.
>
> Signed-off-by: Tvrtko Ursulin
> Cc: Christian König
> Cc: Danilo Krummrich
>
601 - 700 of 737 matches
Mail list logo