Op 02-06-16 om 15:25 schreef Tvrtko Ursulin:
>
> On 01/06/16 18:07, [email protected] wrote:
>> From: John Harrison <[email protected]>
>>
>> The intended usage model for struct fence is that the signalled status
>> should be set on demand rather than polled. That is, there should not
>> be a need for a 'signaled' function to be called everytime the status
>> is queried. Instead, 'something' should be done to enable a signal
>> callback from the hardware which will update the state directly. In
>> the case of requests, this is the seqno update interrupt. The idea is
>> that this callback will only be enabled on demand when something
>> actually tries to wait on the fence.
>>
>> This change removes the polling test and replaces it with the callback
>> scheme. Each fence is added to a 'please poke me' list at the start of
>> i915_add_request(). The interrupt handler then scans through the 'poke
>> me' list when a new seqno pops out and signals any matching
>> fence/request. The fence is then removed from the list so the entire
>> request stack does not need to be scanned every time. Note that the
>> fence is added to the list before the commands to generate the seqno
>> interrupt are added to the ring. Thus the sequence is guaranteed to be
>> race free if the interrupt is already enabled.
>>
>> Note that the interrupt is only enabled on demand (i.e. when
>> __wait_request() is called). Thus there is still a potential race when
>> enabling the interrupt as the request may already have completed.
>> However, this is simply solved by calling the interrupt processing
>> code immediately after enabling the interrupt and thereby checking for
>> already completed requests.
>>
>> Lastly, the ring clean up code has the possibility to cancel
>> outstanding requests (e.g. because TDR has reset the ring). These
>> requests will never get signalled and so must be removed from the
>> signal list manually. This is done by setting a 'cancelled' flag and
>> then calling the regular notify/retire code path rather than
>> attempting to duplicate the list manipulatation and clean up code in
>> multiple places. This also avoid any race condition where the
>> cancellation request might occur after/during the completion interrupt
>> actually arriving.
>>
>> v2: Updated to take advantage of the request unreference no longer
>> requiring the mutex lock.
>>
>> v3: Move the signal list processing around to prevent unsubmitted
>> requests being added to the list. This was occurring on Android
>> because the native sync implementation calls the
>> fence->enable_signalling API immediately on fence creation.
>>
>> Updated after review comments by Tvrtko Ursulin. Renamed list nodes to
>> 'link' instead of 'list'. Added support for returning an error code on
>> a cancelled fence. Update list processing to be more efficient/safer
>> with respect to spinlocks.
>>
>> v5: Made i915_gem_request_submit a static as it is only ever called
>> from one place.
>>
>> Fixed up the low latency wait optimisation. The time delay between the
>> seqno value being to memory and the drive's ISR running can be
>> significant, at least for the wait request micro-benchmark. This can
>> be greatly improved by explicitly checking for seqno updates in the
>> pre-wait busy poll loop. Also added some documentation comments to the
>> busy poll code.
>>
>> Fixed up support for the faking of lost interrupts
>> (test_irq_rings/missed_irq_rings). That is, there is an IGT test that
>> tells the driver to loose interrupts deliberately and then check that
>> everything still works as expected (albeit much slower).
>>
>> Updates from review comments: use non IRQ-save spinlocking, early exit
>> on WARN and improved comments (Tvrtko Ursulin).
>>
>> v6: Updated to newer nigthly and resolved conflicts around the
>> wait_request busy spin optimisation. Also fixed a race condition
>> between this early exit path and the regular completion path.
>>
>> v7: Updated to newer nightly - lots of ring -> engine renaming plus an
>> interface change on get_seqno(). Also added a list_empty() check
>> before acquring spinlocks and doing list processing.
>>
>> v8: Updated to newer nightly - changes to request clean up code mean
>> non of the deferred free mess is needed any more.
>>
>> v9: Moved the request completion processing out of the interrupt
>> handler and into a worker thread (Chris Wilson).
>>
>> For: VIZ-5190
>> Signed-off-by: John Harrison <[email protected]>
>> Cc: Tvrtko Ursulin <[email protected]>
>> Cc: Maarten Lankhorst <[email protected]>
>> ---
>> drivers/gpu/drm/i915/i915_dma.c | 9 +-
>> drivers/gpu/drm/i915/i915_drv.h | 11 ++
>> drivers/gpu/drm/i915/i915_gem.c | 248
>> +++++++++++++++++++++++++++++---
>> drivers/gpu/drm/i915/i915_irq.c | 2 +
>> drivers/gpu/drm/i915/intel_lrc.c | 5 +
>> drivers/gpu/drm/i915/intel_ringbuffer.c | 5 +
>> drivers/gpu/drm/i915/intel_ringbuffer.h | 3 +
>> 7 files changed, 260 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_dma.c
>> b/drivers/gpu/drm/i915/i915_dma.c
>> index 07edaed..f8f60bb 100644
>> --- a/drivers/gpu/drm/i915/i915_dma.c
>> +++ b/drivers/gpu/drm/i915/i915_dma.c
>> @@ -1019,9 +1019,13 @@ static int i915_workqueues_init(struct
>> drm_i915_private *dev_priv)
>> if (dev_priv->wq == NULL)
>> goto out_err;
>>
>> + dev_priv->req_wq = alloc_ordered_workqueue("i915-rq", 0);
>> + if (dev_priv->req_wq == NULL)
>> + goto out_free_wq;
>> +
>
> Single (per-device) ordered workqueue will serialize interrupt processing
> across all engines to one thread. Together with the fact request worker does
> not seem to need the sleeping context, I am thinking that a tasklet per
> engine would be much better (see engine->irq_tasklet for an example).
>
>> dev_priv->hotplug.dp_wq = alloc_ordered_workqueue("i915-dp", 0);
>> if (dev_priv->hotplug.dp_wq == NULL)
>> - goto out_free_wq;
>> + goto out_free_req_wq;
>>
>> dev_priv->gpu_error.hangcheck_wq =
>> alloc_ordered_workqueue("i915-hangcheck", 0);
>> @@ -1032,6 +1036,8 @@ static int i915_workqueues_init(struct
>> drm_i915_private *dev_priv)
>>
>> out_free_dp_wq:
>> destroy_workqueue(dev_priv->hotplug.dp_wq);
>> +out_free_req_wq:
>> + destroy_workqueue(dev_priv->req_wq);
>> out_free_wq:
>> destroy_workqueue(dev_priv->wq);
>> out_err:
>> @@ -1044,6 +1050,7 @@ static void i915_workqueues_cleanup(struct
>> drm_i915_private *dev_priv)
>> {
>> destroy_workqueue(dev_priv->gpu_error.hangcheck_wq);
>> destroy_workqueue(dev_priv->hotplug.dp_wq);
>> + destroy_workqueue(dev_priv->req_wq);
>> destroy_workqueue(dev_priv->wq);
>> }
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index 69c3412..5a7f256 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -1851,6 +1851,9 @@ struct drm_i915_private {
>> */
>> struct workqueue_struct *wq;
>>
>> + /* Work queue for request completion processing */
>> + struct workqueue_struct *req_wq;
>> +
>> /* Display functions */
>> struct drm_i915_display_funcs display;
>>
>> @@ -2359,6 +2362,10 @@ struct drm_i915_gem_request {
>> */
>> struct fence fence;
>> struct rcu_head rcu_head;
>> + struct list_head signal_link;
>> + bool cancelled;
>> + bool irq_enabled;
>> + bool signal_requested;
>>
>> /** On Which ring this request was generated */
>> struct drm_i915_private *i915;
>> @@ -2460,6 +2467,10 @@ struct drm_i915_gem_request {
>> struct drm_i915_gem_request * __must_check
>> i915_gem_request_alloc(struct intel_engine_cs *engine,
>> struct i915_gem_context *ctx);
>> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
>> + bool fence_locked);
>> +void i915_gem_request_notify(struct intel_engine_cs *ring, bool
>> fence_locked);
>> +void i915_gem_request_worker(struct work_struct *work);
>>
>> static inline bool i915_gem_request_completed(struct drm_i915_gem_request
>> *req)
>> {
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index 97e3138..83cf9b0 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -39,6 +39,8 @@
>> #include <linux/pci.h>
>> #include <linux/dma-buf.h>
>>
>> +static void i915_gem_request_submit(struct drm_i915_gem_request *req);
>> +
>> static void i915_gem_object_flush_gtt_write_domain(struct
>> drm_i915_gem_object *obj);
>> static void i915_gem_object_flush_cpu_write_domain(struct
>> drm_i915_gem_object *obj);
>> static void
>> @@ -1237,9 +1239,8 @@ int __i915_wait_request(struct drm_i915_gem_request
>> *req,
>> {
>> struct intel_engine_cs *engine = i915_gem_request_get_engine(req);
>> struct drm_i915_private *dev_priv = req->i915;
>> - const bool irq_test_in_progress =
>> - ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) &
>> intel_engine_flag(engine);
>> int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
>> + uint32_t seqno;
>> DEFINE_WAIT(wait);
>> unsigned long timeout_expire;
>> s64 before = 0; /* Only to silence a compiler warning. */
>> @@ -1247,9 +1248,6 @@ int __i915_wait_request(struct drm_i915_gem_request
>> *req,
>>
>> WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
>>
>> - if (list_empty(&req->list))
>> - return 0;
>> -
>> if (i915_gem_request_completed(req))
>> return 0;
>>
>> @@ -1275,15 +1273,17 @@ int __i915_wait_request(struct drm_i915_gem_request
>> *req,
>> trace_i915_gem_request_wait_begin(req);
>>
>> /* Optimistic spin for the next jiffie before touching IRQs */
>> - ret = __i915_spin_request(req, state);
>> - if (ret == 0)
>> - goto out;
>> -
>> - if (!irq_test_in_progress && WARN_ON(!engine->irq_get(engine))) {
>> - ret = -ENODEV;
>> - goto out;
>> + if (req->seqno) {
>
> This needs a comment I think because it is so unusual and new that req->seqno
> == 0 is a special path. To explain why and how it can happen here.
>
>> + ret = __i915_spin_request(req, state);
>> + if (ret == 0)
>> + goto out;
>> }
>>
>> + /*
>> + * Enable interrupt completion of the request.
>> + */
>> + fence_enable_sw_signaling(&req->fence);
>> +
>> for (;;) {
>> struct timer_list timer;
>>
>> @@ -1306,6 +1306,21 @@ int __i915_wait_request(struct drm_i915_gem_request
>> *req,
>> break;
>> }
>>
>> + if (req->seqno) {
>> + /*
>> + * There is quite a lot of latency in the user interrupt
>> + * path. So do an explicit seqno check and potentially
>> + * remove all that delay.
>> + */
>> + if (req->engine->irq_seqno_barrier)
>> + req->engine->irq_seqno_barrier(req->engine);
>> + seqno = engine->get_seqno(engine);
>> + if (i915_seqno_passed(seqno, req->seqno)) {
>> + ret = 0;
>> + break;
>> + }
>> + }
>> +
>> if (signal_pending_state(state, current)) {
>> ret = -ERESTARTSYS;
>> break;
>> @@ -1332,14 +1347,32 @@ int __i915_wait_request(struct drm_i915_gem_request
>> *req,
>> destroy_timer_on_stack(&timer);
>> }
>> }
>> - if (!irq_test_in_progress)
>> - engine->irq_put(engine);
>>
>> finish_wait(&engine->irq_queue, &wait);
>
> Hm I don't understand why our custom waiting remains? Shouldn't fence_wait
> just be called after the optimistic spin, more or less?
>
>>
>> out:
>> trace_i915_gem_request_wait_end(req);
>>
>> + if ((ret == 0) && (req->seqno)) {
>> + if (req->engine->irq_seqno_barrier)
>> + req->engine->irq_seqno_barrier(req->engine);
>> + seqno = engine->get_seqno(engine);
>> + if (i915_seqno_passed(seqno, req->seqno) &&
>> + !i915_gem_request_completed(req)) {
>> + /*
>> + * Make sure the request is marked as completed before
>> + * returning. NB: Need to acquire the spinlock around
>> + * the whole call to avoid a race condition with the
>> + * interrupt handler is running concurrently and could
>> + * cause this invocation to early exit even though the
>> + * request has not actually been fully processed yet.
>> + */
>> + spin_lock_irq(&req->engine->fence_lock);
>> + i915_gem_request_notify(req->engine, true);
>> + spin_unlock_irq(&req->engine->fence_lock);
>> + }
>> + }
>> +
>> if (timeout) {
>> s64 tres = *timeout - (ktime_get_raw_ns() - before);
>>
>> @@ -1405,6 +1438,11 @@ static void i915_gem_request_retire(struct
>> drm_i915_gem_request *request)
>> {
>> trace_i915_gem_request_retire(request);
>>
>> + if (request->irq_enabled) {
>> + request->engine->irq_put(request->engine);
>> + request->irq_enabled = false;
>
> What protects request->irq_enabled? Here versus enable_signalling bit? It can
> be called from the external fence users which would take the fence_lock, but
> here it does not.
>
>> + }
>
>> +
>> /* We know the GPU must have read the request to have
>> * sent us the seqno + interrupt, so use the position
>> * of tail of the request to update the last known position
>> @@ -1418,6 +1456,22 @@ static void i915_gem_request_retire(struct
>> drm_i915_gem_request *request)
>> list_del_init(&request->list);
>> i915_gem_request_remove_from_client(request);
>>
>> + /*
>> + * In case the request is still in the signal pending list,
>> + * e.g. due to being cancelled by TDR, preemption, etc.
>> + */
>> + if (!list_empty(&request->signal_link)) {
>
> No locking required here?
Considering the locked function is used, I'm assuming this function holds the
fence_lock.
If not, something's seriously wrong.
>> + /*
>> + * The request must be marked as cancelled and the underlying
>> + * fence as failed. NB: There is no explicit fence fail API,
>> + * there is only a manual poke and signal.
>> + */
>> + request->cancelled = true;
>> + /* How to propagate to any associated sync_fence??? */
^This way works, so comment can be removed.
There's deliberately no way to cancel a fence, it's the same path but with
status member set.
If you have a fence for another driver, there's really no good way to handle
failure. So you have to treat
it as if it succeeded.
>> + request->fence.status = -EIO;
>> + fence_signal_locked(&request->fence);
>
> And here?
>
>> + }
>> +
>> if (request->previous_context) {
>> if (i915.enable_execlists)
>> intel_lr_context_unpin(request->previous_context,
>> @@ -2670,6 +2724,12 @@ void __i915_add_request(struct drm_i915_gem_request
>> *request,
>> */
>> request->postfix = intel_ring_get_tail(ringbuf);
>>
>> + /*
>> + * Add the fence to the pending list before emitting the commands to
>> + * generate a seqno notification interrupt.
>> + */
>> + i915_gem_request_submit(request);
>> +
>> if (i915.enable_execlists)
>> ret = engine->emit_request(request);
>> else {
>> @@ -2755,25 +2815,154 @@ static void i915_gem_request_free(struct fence
>> *req_fence)
>> struct drm_i915_gem_request *req;
>>
>> req = container_of(req_fence, typeof(*req), fence);
>> +
>> + WARN_ON(req->irq_enabled);
>
> How useful is this? If it went wrong engine irq reference counting would be
> bad. Okay no one would notice, but we could then stick some other warns here,
> list !list_empy(req->list) and who knows what, which we don't have, so I am
> just wondering if this one brings any value.
>
>> +
>> call_rcu(&req->rcu_head, i915_gem_request_free_rcu);
>> }
>>
>> -static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>> +/*
>> + * The request is about to be submitted to the hardware so add the fence to
>> + * the list of signalable fences.
>> + *
>> + * NB: This does not necessarily enable interrupts yet. That only occurs on
>> + * demand when the request is actually waited on. However, adding it to the
>> + * list early ensures that there is no race condition where the interrupt
>> + * could pop out prematurely and thus be completely lost. The race is merely
>> + * that the interrupt must be manually checked for after being enabled.
>> + */
>> +static void i915_gem_request_submit(struct drm_i915_gem_request *req)
>> {
>> - /* Interrupt driven fences are not implemented yet.*/
>> - WARN(true, "This should not be called!");
>> - return true;
>> + /*
>> + * Always enable signal processing for the request's fence object
>> + * before that request is submitted to the hardware. Thus there is no
>> + * race condition whereby the interrupt could pop out before the
>> + * request has been added to the signal list. Hence no need to check
>> + * for completion, undo the list add and return false.
>> + */
>> + i915_gem_request_reference(req);
>> + spin_lock_irq(&req->engine->fence_lock);
>> + WARN_ON(!list_empty(&req->signal_link));
>> + list_add_tail(&req->signal_link, &req->engine->fence_signal_list);
>> + spin_unlock_irq(&req->engine->fence_lock);
>> +
>> + /*
>> + * NB: Interrupts are only enabled on demand. Thus there is still a
>> + * race where the request could complete before the interrupt has
>> + * been enabled. Thus care must be taken at that point.
>> + */
>> +
>> + /* Have interrupts already been requested? */
>> + if (req->signal_requested)
>> + i915_gem_request_enable_interrupt(req, false);
>
> I am thinking that the fence lock could be held here until the end of the
> function and in such way i915_gem_request_enable_interrupt would not need the
> fence_locked parameter any more.
>
> It would probably also be safer with regards to accesing the
> req->signal_requested. I am not sure that enable signalling and this
> otherwise can't race and miss the signal_requested getting set?
>
>> +}
>> +
>> +/*
>> + * The request is being actively waited on, so enable interrupt based
>> + * completion signalling.
>> + */
>> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
>> + bool fence_locked)
>> +{
>> + struct drm_i915_private *dev_priv = req->engine->i915;
>> + const bool irq_test_in_progress =
>> + ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) &
>> + intel_engine_flag(req->engine);
>> +
>> + if (req->irq_enabled)
>> + return;
>> +
>> + if (irq_test_in_progress)
>> + return;
>> +
>> + if (!WARN_ON(!req->engine->irq_get(req->engine)))
>> + req->irq_enabled = true;
>
> The double negation confused me a bit. It is probably not ideal since
> WARN_ONs go to the out of line section so in a way it is deliberately
> penalising the fast and expected path. I think it would be better to put a
> WARN on the else path.
>
>> +
>> + /*
>> + * Because the interrupt is only enabled on demand, there is a race
>> + * where the interrupt can fire before anyone is looking for it. So
>> + * do an explicit check for missed interrupts.
>> + */
>> + i915_gem_request_notify(req->engine, fence_locked);
>> }
>>
>> -static bool i915_gem_request_is_completed(struct fence *req_fence)
>> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>> {
>> struct drm_i915_gem_request *req = container_of(req_fence,
>> typeof(*req), fence);
>> +
>> + /*
>> + * No need to actually enable interrupt based processing until the
>> + * request has been submitted to the hardware. At which point
>> + * 'i915_gem_request_submit()' is called. So only really enable
>> + * signalling in there. Just set a flag to say that interrupts are
>> + * wanted when the request is eventually submitted. On the other hand
>> + * if the request has already been submitted then interrupts do need
>> + * to be enabled now.
>> + */
>> +
>> + req->signal_requested = true;
>> +
>> + if (!list_empty(&req->signal_link))
>
> In what scenarios is the list_empty check needed? Someone can somehow enable
> signalling on a fence not yet submitted?
>> + i915_gem_request_enable_interrupt(req, true);
>> +
>> + return true;
>> +}
>> +
>> +/**
>> + * i915_gem_request_worker - request work handler callback.
>> + * @work: Work structure
>> + * Called in response to a seqno interrupt to process the completed
>> requests.
>> + */
>> +void i915_gem_request_worker(struct work_struct *work)
>> +{
>> + struct intel_engine_cs *engine;
>> +
>> + engine = container_of(work, struct intel_engine_cs, request_work);
>> + i915_gem_request_notify(engine, false);
>> +}
>> +
>> +void i915_gem_request_notify(struct intel_engine_cs *engine, bool
>> fence_locked)
>> +{
>> + struct drm_i915_gem_request *req, *req_next;
>> + unsigned long flags;
>> u32 seqno;
>>
>> - seqno = req->engine->get_seqno(req->engine);
>> + if (list_empty(&engine->fence_signal_list))
>
> Okay this without the lock still makes me nervous. I'd rather not having to
> think about why it is safe and can't miss a wakeup.
>
> Also I would be leaning toward having i915_gem_request_notify and
> i915_gem_request_notify__unlocked. With the enable_interrupts simplification
> I suggested it think it would look better and be more consistent with the
> rest of the driver.
>
>> + return;
>> +
>> + if (!fence_locked)
>> + spin_lock_irqsave(&engine->fence_lock, flags);
>
> Not called from hard irq context so can be just spin_lock_irq.
>
> But if you agree to go with the tasklet it would then be spin_lock_bh.
fence is always spin_lock_irq, if this requires _bh then it can't go into the
tasklet.
>>
>> - return i915_seqno_passed(seqno, req->seqno);
>> + if (engine->irq_seqno_barrier)
>> + engine->irq_seqno_barrier(engine);
>> + seqno = engine->get_seqno(engine);
>> +
>> + list_for_each_entry_safe(req, req_next, &engine->fence_signal_list,
>> signal_link) {
>> + if (!req->cancelled) {
>> + if (!i915_seqno_passed(seqno, req->seqno))
>> + break;
>
> Merge to one if statement?
>
>> + }
>> +
>> + /*
>> + * Start by removing the fence from the signal list otherwise
>> + * the retire code can run concurrently and get confused.
>> + */
>> + list_del_init(&req->signal_link);
>> +
>> + if (!req->cancelled)
>> + fence_signal_locked(&req->fence);
>
> I forgot how signalling errors to userspace works? Does that still work for
> cancelled fences in this series?
>
>> +
>> + if (req->irq_enabled) {
>> + req->engine->irq_put(req->engine);
>> + req->irq_enabled = false;
>> + }
>> +
>> + i915_gem_request_unreference(req);
>> + }
>> +
>> + if (!fence_locked)
>> + spin_unlock_irqrestore(&engine->fence_lock, flags);
>> }
>>
>> static const char *i915_gem_request_get_driver_name(struct fence
>> *req_fence)
>> @@ -2816,7 +3005,6 @@ static void i915_gem_request_fence_value_str(struct
>> fence *req_fence,
>>
>> static const struct fence_ops i915_gem_request_fops = {
>> .enable_signaling = i915_gem_request_enable_signaling,
>> - .signaled = i915_gem_request_is_completed,
>> .wait = fence_default_wait,
>> .release = i915_gem_request_free,
>> .get_driver_name = i915_gem_request_get_driver_name,
>> @@ -2902,6 +3090,7 @@ __i915_gem_request_alloc(struct intel_engine_cs
>> *engine,
>> req->ctx = ctx;
>> i915_gem_context_reference(req->ctx);
>>
>> + INIT_LIST_HEAD(&req->signal_link);
>> fence_init(&req->fence, &i915_gem_request_fops, &engine->fence_lock,
>> ctx->engine[engine->id].fence_timeline.fence_context,
>>
>> i915_fence_timeline_get_next_seqno(&ctx->engine[engine->id].fence_timeline));
>> @@ -3036,6 +3225,13 @@ static void i915_gem_reset_engine_cleanup(struct
>> drm_i915_private *dev_priv,
>> i915_gem_request_retire(request);
>> }
>>
>> + /*
>> + * Tidy up anything left over. This includes a call to
>> + * i915_gem_request_notify() which will make sure that any requests
>> + * that were on the signal pending list get also cleaned up.
>> + */
>> + i915_gem_retire_requests_ring(engine);
>
> Hmm.. but this function has just walked the same lists this will, and done
> the same processing. Why call this from here? It looks bad to me, the two are
> different special cases of the similar thing so I can't see that calling this
> from here makes sense.
>
>> +
>> /* Having flushed all requests from all queues, we know that all
>> * ringbuffers must now be empty. However, since we do not reclaim
>> * all space when retiring the request (to prevent HEADs colliding
>> @@ -3082,6 +3278,13 @@ i915_gem_retire_requests_ring(struct intel_engine_cs
>> *engine)
>> {
>> WARN_ON(i915_verify_lists(engine->dev));
>>
>> + /*
>> + * If no-one has waited on a request recently then interrupts will
>> + * not have been enabled and thus no requests will ever be marked as
>> + * completed. So do an interrupt check now.
>> + */
>> + i915_gem_request_notify(engine, false);
>
> Would it work to signal the fence from the existing loop a bit above in this
> function which already walks the request list in search for completed ones?
> Or maybe even in i915_gem_request_retire?
>
> I am thinking about doing less list walking and better integration with the
> core GEM. Downside would be more traffic on the fence_lock, hmm.. not sure
> then. It just looks a bit bolted on like this.
>
> I don't see it being a noticeable cost so perhaps it can stay like this for
> now.
>
>> +
>> /* Retire requests first as we use it above for the early return.
>> * If we retire requests last, we may use a later seqno and so clear
>> * the requests lists without clearing the active list, leading to
>> @@ -5102,6 +5305,7 @@ init_engine_lists(struct intel_engine_cs *engine)
>> {
>> INIT_LIST_HEAD(&engine->active_list);
>> INIT_LIST_HEAD(&engine->request_list);
>> + INIT_LIST_HEAD(&engine->fence_signal_list);
>> }
>>
>> void
>> diff --git a/drivers/gpu/drm/i915/i915_irq.c
>> b/drivers/gpu/drm/i915/i915_irq.c
>> index f780421..a87a3c5 100644
>> --- a/drivers/gpu/drm/i915/i915_irq.c
>> +++ b/drivers/gpu/drm/i915/i915_irq.c
>> @@ -994,6 +994,8 @@ static void notify_ring(struct intel_engine_cs *engine)
>> trace_i915_gem_request_notify(engine);
>> engine->user_interrupts++;
>>
>> + queue_work(engine->i915->req_wq, &engine->request_work);
>> +
>> wake_up_all(&engine->irq_queue);
>
> Yes that is the weird part, why the engine->irq_queue has to remain with this
> patch?
>
>> }
>>
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c
>> b/drivers/gpu/drm/i915/intel_lrc.c
>> index f126bcb..134759d 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -1879,6 +1879,8 @@ void intel_logical_ring_cleanup(struct intel_engine_cs
>> *engine)
>>
>> dev_priv = engine->i915;
>>
>> + cancel_work_sync(&engine->request_work);
>> +
>> if (engine->buffer) {
>> intel_logical_ring_stop(engine);
>> WARN_ON((I915_READ_MODE(engine) & MODE_IDLE) == 0);
>> @@ -2027,6 +2029,7 @@ logical_ring_setup(struct drm_device *dev, enum
>> intel_engine_id id)
>>
>> INIT_LIST_HEAD(&engine->active_list);
>> INIT_LIST_HEAD(&engine->request_list);
>> + INIT_LIST_HEAD(&engine->fence_signal_list);
>> INIT_LIST_HEAD(&engine->buffers);
>> INIT_LIST_HEAD(&engine->execlist_queue);
>> spin_lock_init(&engine->execlist_lock);
>> @@ -2035,6 +2038,8 @@ logical_ring_setup(struct drm_device *dev, enum
>> intel_engine_id id)
>> tasklet_init(&engine->irq_tasklet,
>> intel_lrc_irq_handler, (unsigned long)engine);
>>
>> + INIT_WORK(&engine->request_work, i915_gem_request_worker);
>> +
>> logical_ring_init_platform_invariants(engine);
>> logical_ring_default_vfuncs(engine);
>> logical_ring_default_irqs(engine, info->irq_shift);
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> index fbd3f12..1641096 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> @@ -2254,6 +2254,7 @@ static int intel_init_ring_buffer(struct drm_device
>> *dev,
>> INIT_LIST_HEAD(&engine->request_list);
>> INIT_LIST_HEAD(&engine->execlist_queue);
>> INIT_LIST_HEAD(&engine->buffers);
>> + INIT_LIST_HEAD(&engine->fence_signal_list);
>> spin_lock_init(&engine->fence_lock);
>> i915_gem_batch_pool_init(dev, &engine->batch_pool);
>> memset(engine->semaphore.sync_seqno, 0,
>> @@ -2261,6 +2262,8 @@ static int intel_init_ring_buffer(struct drm_device
>> *dev,
>>
>> init_waitqueue_head(&engine->irq_queue);
>>
>> + INIT_WORK(&engine->request_work, i915_gem_request_worker);
>> +
>> ringbuf = intel_engine_create_ringbuffer(engine, 32 * PAGE_SIZE);
>> if (IS_ERR(ringbuf)) {
>> ret = PTR_ERR(ringbuf);
>> @@ -2307,6 +2310,8 @@ void intel_cleanup_engine(struct intel_engine_cs
>> *engine)
>>
>> dev_priv = engine->i915;
>>
>> + cancel_work_sync(&engine->request_work);
>> +
>> if (engine->buffer) {
>> intel_stop_engine(engine);
>> WARN_ON(!IS_GEN2(dev_priv) && (I915_READ_MODE(engine) & MODE_IDLE)
>> == 0);
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h
>> b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> index 3f39daf..51779b4 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> @@ -347,6 +347,9 @@ struct intel_engine_cs {
>> u32 (*get_cmd_length_mask)(u32 cmd_header);
>>
>> spinlock_t fence_lock;
>> + struct list_head fence_signal_list;
>> +
>> + struct work_struct request_work;
>> };
>>
>> static inline bool
_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx