Re: [RFC PATCH] drm/sched: Add new KUnit test suite for concurrent job submission

Tvrtko Ursulin Fri, 06 Feb 2026 04:12:25 -0800


On 06/02/2026 10:36, Marco Pagani wrote:



On 05/02/2026 10:53, Tvrtko Ursulin wrote:


On 04/02/2026 16:33, Marco Pagani wrote:

8><

+       {
+               .description = "Concurrently submit multiple jobs in a single 
entity",
+               .job_base_us = 1000,
+               .num_jobs = 10,
+               .num_subs = 64,
+       },
+};
+
+static void
+drm_sched_concurrent_desc(const struct drm_sched_concurrent_params *params, 
char *desc)
+{
+       strscpy(desc, params->description, KUNIT_PARAM_DESC_SIZE);
+}
+
+KUNIT_ARRAY_PARAM(drm_sched_concurrent, drm_sched_concurrent_cases, 
drm_sched_concurrent_desc);
+
+struct submitter_data {
+       struct work_struct work;
+       struct sched_concurrent_test_context *ctx;
+       struct drm_mock_sched_entity *entity;
+       struct drm_mock_sched_job **jobs;
+       struct kunit *test;
+       unsigned int id;
+       bool timedout;
+};
+
+static void drm_sched_submitter_worker(struct work_struct *work)
+{
+       const struct drm_sched_concurrent_params *params;
+       struct sched_concurrent_test_context *ctx;
+       struct submitter_data *sub_data;
+       unsigned int i, duration_us;
+       unsigned long timeout_jiffies;
+       bool done;
+
+       sub_data = container_of(work, struct submitter_data, work);
+       ctx = sub_data->ctx;
+       params = sub_data->test->param_value;
+
+       wait_for_completion(&ctx->wait_go);
+
+       for (i = 0; i < params->num_jobs; i++) {
+               duration_us = params->job_base_us + (sub_data->id * 10);


Why is job duration dependent by the submitter id?


Just a simple way to have a deterministic distribution of durations.
I can change it if it doesn't fit.

Would it be feasiable to not use auto-completing jobs and instead
advance the timeline manually? Given how the premise of the test seems
to be about concurrent submission it sounds plausible that what happens
after submission maybe isn't very relevant.


Good idea! I'll run some experiments and see if it works.


Cool, I will await your findings in v2. :)



After fiddling a bit with the code, I came to the conclusion that
changing the design to use manual timeline advancement is doable, but
not beneficial, since it would require serializing job submission into
"batches" using a two-step process, i.e., (i) workers submit jobs, and
(ii) the main thread waits for all submissions, advances the timeline,
and then releases the workers for the next iteration.


What do you mean by next iteration?

In the patch you have each worker submit all jobs in one go.


I mean, if I change the code to use manual timeline advancement, then I
must introduce some synchronization logic that makes the main thread
advance the timeline only after the workers have submitted their jobs.

Oh that, I was thinking advancing after flushing the workqueue would beenough for this use case. Since that one does not care about whencompletions happens they can just be cleaned up at the exit.

Since workers submit multiple jobs, I was thinking it would be better to
have the workers submit jobs in batches instead of all in one go.


No strong opinion from me, as long as it is clear what is being tested.

This approach would partially defeat the purpose of a concurrency test
as it would not allow job submission (KUnit process context) to overlap
with job completion (hrtimer callback interrupt context) that models
asynchronous hardware in the mock scheduler, limiting contention on the
DRM scheduler internal locking. Moreover, while manual advancement might
appear to make the test deterministic, it does not since the order in
which the workers are scheduled will still be non-deterministic.


Ah, so it depends what is the test actually wanting to test. In my view
exercising parallel submit is one thing, while interleaving submission
with completion is something else.

In the test as written I think there is very little of the latter. Each
worker submits all their jobs in one tight loop. Jobs you set to be 1ms
so first job completion is 1ms away from when workers are released. A
lot of the jobs can be submitted in 1ms and it would be interesting to
see exactly how much, from how many workers.

If desire is to interleave completion and submission I think that should
be made more explicit (less depending on how fast is the underlying
machine). For example adding delays into the submit loop to actually
guarantee that.


Fair point.

But I would also recommend parallel submit and parallel submit vs
completions are tested in separate test cases. It should be easy to do
with some flags and almost no new code. I was suggesting flags for some
other thing in the initial review as well. Right, for auto-complete. So
flag could be something like:

+static const struct drm_sched_concurrent_params
drm_sched_concurrent_cases[] = {
+       {
+               .description = "Concurrently submit a single job in a single 
entity",
+               .job_base_us = 1000,
+               .num_jobs = 1,
+               .num_subs = 32,
                .flags = INTERLEAVE_SUBMIT_AND_COMPLETE,
+       },

In the submit loop:

+       for (i = 0; i < params->num_jobs; i++) {
+               duration_us = params->job_base_us + (sub_data->id * 10);
                if (flags & INTERLEAVE_SUBMIT_AND_COMPLETE) {
                        drm_mock_sched_job_set_duration_us(sub_data->jobs[i], 
duration_us);
                        // Add a delay based on time elapse and job duration to 
guarantee job
completions start arriving
                }
+               drm_mock_sched_job_submit(sub_data->jobs[i]);
+       }


And of course handle the job waiting stage appropriately depending on
auto-complete or not.

Anyway, testing as little as possible at a time is a best practice I
recommend, but if you insist, there is also nothing fundamentally wrong
with the test as you have it so I won't block it.


Agreed. Unit tests should test one functionality at a time and be clear
about which one. I'll follow your suggestions and have two separate test
cases: a basic one for concurrent submissions with manual timeline
advancement (no batches, workers submit all jobs at once) and then a
second one with automatic timeline advancement for testing interleaving
submissions with completions.

At this point though, I think it would be better to move the tests to a
separate source file, as the partial similarity of the first concurrent
test case with drm_sched_basic_submit could create some confusion.


Works for me, thank you!

Regards,

Tvrtko

Re: [RFC PATCH] drm/sched: Add new KUnit test suite for concurrent job submission

Reply via email to