Quoting Tvrtko Ursulin (2020-07-17 09:34:07)
> 
> On 16/07/2020 21:44, Chris Wilson wrote:
> I am not sure if the batch duration is not too short in practice, the 
> add loop will really rapidly end all, just needs 64 iterations on 
> average to end all 32 I think. So 64 WC writes from the CPU compared to 
> CSB processing and breadcrumb signaling latencies might be too short. 
> Maybe some small random udelays in the loop would be more realistic. 
> Maybe as a 2nd flavour of the test just in case.. more coverage the better.

GPU                     kernel                  IGT
semaphore wait
  -> raise interrupt
                        handle interrupt
                          -> kick tasklet
                        begin preempt-to-busy   semaphore signal
semaphore completes
request completes
                        submit new ELSP[]
                          -> stale unwound request

Duration of the batch/semaphore itself doesn't really factor into it,
it's that we have to let batch complete after we begin the process of
scheduling it out for an expired timeslice. It's such a small window and
I don't see a good way of hitting it reliably from userspace.

With some printk, I was able to confirm that we were timeslicing virtual
requests and moving them between engines with active breadcrumbs. But I
never once saw any of the bugs with the stale requests, using this test.

Somehow we want to length the preempt-to-busy window and coincide the
request completion at the same time. So far all I have is yucky (too
single purpose, we would be better off writing unit tests for each of
the steps involved).
-Chris
_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to