guc: Prepare for nonblocking execbuf submission

Chris Wilson Mon, 05 Sep 2016 01:09:07 -0700

On Fri, Sep 02, 2016 at 07:20:52PM +0100, Dave Gordon wrote:
> On 30/08/16 09:18, Chris Wilson wrote:
> >Currently the presumption is that the request construction and its
> >submission to the GuC are all under the same holding of struct_mutex. We
> >wish to relax this to separate the request construction and the later
> >submission to the GuC. This requires us to reserve some space in the
> >GuC command queue for the future submission. For flexibility to handle
> >out-of-order request submission we do not preallocate the next slot in
> >the GuC command queue during request construction, just ensuring that
> >there is enough space later.
> >
> >Signed-off-by: Chris Wilson <[email protected]>
> >---
> > drivers/gpu/drm/i915/i915_guc_submission.c | 55 
> > ++++++++++++++----------------
> > drivers/gpu/drm/i915/intel_guc.h           |  3 ++
> > 2 files changed, 29 insertions(+), 29 deletions(-)
> 
> Hmm .. the functional changes look mostly OK, apart from some
> locking, but there seems to be a great deal of unnecessary churn,
> such as combining statements which had been kept separate for
> clarity :(
> 
> >diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c 
> >b/drivers/gpu/drm/i915/i915_guc_submission.c
> >index 2332f9c98bdd..a047e61adc2a 100644
> >--- a/drivers/gpu/drm/i915/i915_guc_submission.c
> >+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> >@@ -434,20 +434,23 @@ int i915_guc_wq_check_space(struct 
> >drm_i915_gem_request *request)
> > {
> >     const size_t wqi_size = sizeof(struct guc_wq_item);
> >     struct i915_guc_client *gc = request->i915->guc.execbuf_client;
> >-    struct guc_process_desc *desc;
> >+    struct guc_process_desc *desc = gc->client_base + gc->proc_desc_offset;
> >     u32 freespace;
> >+    int ret;
> >
> >-    GEM_BUG_ON(gc == NULL);
> >-
> >-    desc = gc->client_base + gc->proc_desc_offset;
> >-
> >+    spin_lock(&gc->lock);
> >     freespace = CIRC_SPACE(gc->wq_tail, desc->head, gc->wq_size);
> >-    if (likely(freespace >= wqi_size))
> >-            return 0;
> >-
> >-    gc->no_wq_space += 1;
> >+    freespace -= gc->wq_rsvd;
> >+    if (likely(freespace >= wqi_size)) {
> >+            gc->wq_rsvd += wqi_size;
> >+            ret = 0;
> >+    } else {
> >+            gc->no_wq_space++;
> >+            ret = -EAGAIN;
> >+    }
> >+    spin_unlock(&gc->lock);
> >
> >-    return -EAGAIN;
> >+    return ret;
> > }
> >
> > static void guc_add_workqueue_item(struct i915_guc_client *gc,
> >@@ -457,22 +460,9 @@ static void guc_add_workqueue_item(struct 
> >i915_guc_client *gc,
> >     const size_t wqi_size = sizeof(struct guc_wq_item);
> >     const u32 wqi_len = wqi_size/sizeof(u32) - 1;
> >     struct intel_engine_cs *engine = rq->engine;
> >-    struct guc_process_desc *desc;
> >     struct guc_wq_item *wqi;
> >     void *base;
> >-    u32 freespace, tail, wq_off, wq_page;
> >-
> >-    desc = gc->client_base + gc->proc_desc_offset;
> >-
> >-    /* Free space is guaranteed, see i915_guc_wq_check_space() above */
> 
> This comment seems to have been lost. It still applies (mutatis
> mutandis), so it should be relocated to some part of the new version
> ...
> 
> >-    freespace = CIRC_SPACE(gc->wq_tail, desc->head, gc->wq_size);
> >-    GEM_BUG_ON(freespace < wqi_size);
> >-
> >-    /* The GuC firmware wants the tail index in QWords, not bytes */
> >-    tail = rq->tail;
> >-    GEM_BUG_ON(tail & 7);
> >-    tail >>= 3;
> >-    GEM_BUG_ON(tail > WQ_RING_TAIL_MAX);
> 
> This *commented* sequence of statements seems clearer than the
> replacement below ...
> 
> >+    u32 wq_off, wq_page;
> >
> >     /* For now workqueue item is 4 DWs; workqueue buffer is 2 pages. So we
> >      * should not have the case where structure wqi is across page, neither
> >@@ -482,18 +472,19 @@ static void guc_add_workqueue_item(struct 
> >i915_guc_client *gc,
> >      * workqueue buffer dw by dw.
> >      */
> >     BUILD_BUG_ON(wqi_size != 16);
> 
> This would be a good place to note that:
> 
> /* Reserved space is guaranteed, see i915_guc_wq_check_space() above */
> 
> >+    GEM_BUG_ON(gc->wq_rsvd < wqi_size);
> >
> >     /* postincrement WQ tail for next time */
> >     wq_off = gc->wq_tail;
> >+    GEM_BUG_ON(wq_off & (wqi_size - 1));
> >     gc->wq_tail += wqi_size;
> >     gc->wq_tail &= gc->wq_size - 1;
> >-    GEM_BUG_ON(wq_off & (wqi_size - 1));
> >+    gc->wq_rsvd -= wqi_size;
> >
> >     /* WQ starts from the page after doorbell / process_desc */
> >     wq_page = (wq_off + GUC_DB_SIZE) >> PAGE_SHIFT;
> >-    wq_off &= PAGE_SIZE - 1;
> >     base = kmap_atomic(i915_gem_object_get_page(gc->vma->obj, wq_page));
> >-    wqi = (struct guc_wq_item *)((char *)base + wq_off);
> >+    wqi = (struct guc_wq_item *)((char *)base + offset_in_page(wq_off));
> >
> >     /* Now fill in the 4-word work queue item */
> >     wqi->header = WQ_TYPE_INORDER |
> >@@ -504,7 +495,7 @@ static void guc_add_workqueue_item(struct 
> >i915_guc_client *gc,
> >     /* The GuC wants only the low-order word of the context descriptor */
> >     wqi->context_desc = (u32)intel_lr_context_descriptor(rq->ctx, engine);
> >
> >-    wqi->ring_tail = tail << WQ_RING_TAIL_SHIFT;
> >+    wqi->ring_tail = rq->tail >> 3 << WQ_RING_TAIL_SHIFT;
> 
> This line is particularly ugly. I think it's the >> chevron << effect.
> Parenthesisation would help, but it would be nicer as separate lines.
> Also, there's no explanation of the "3" here, unlike the original
> version above.


Correct. The code was and still is ugly.

> >     wqi->fence_id = rq->fence.seqno;
> >
> >     kunmap_atomic(base);
> >@@ -591,8 +582,10 @@ static void i915_guc_submit(struct drm_i915_gem_request 
> >*rq)
> >     struct i915_guc_client *client = guc->execbuf_client;
> >     int b_ret;
> >
> >+    spin_lock(&client->lock);
> >     guc_add_workqueue_item(client, rq);
> >     b_ret = guc_ring_doorbell(client);
> >+    spin_unlock(&client->lock);
> >
> >     client->submissions[engine_id] += 1;
> 
> Outside the spinlock? Do we still have the BKL during submit(), just
> not during i915_guc_wq_check_space() ? If so, then
> guc_ring_doorbell() doesn't strictly need to be inside the spinlock
> (or the lock could be inside guc_add_workqueue_item()); but if not
> then the update of submissions[] should be inside.

Nope, this code is just garbage, so I didn't bother polishing it.

> >     client->retcode = b_ret;
> >@@ -770,6 +763,8 @@ guc_client_alloc(struct drm_i915_private *dev_priv,
> >     if (!client)
> >             return NULL;
> >
> >+    spin_lock_init(&client->lock);
> >+
> >     client->owner = ctx;
> >     client->guc = guc;
> >     client->engines = engines;
> >@@ -1019,9 +1014,11 @@ int i915_guc_submission_enable(struct 
> >drm_i915_private *dev_priv)
> >             engine->submit_request = i915_guc_submit;
> >
> >             /* Replay the current set of previously submitted requests */
> >-            list_for_each_entry(request, &engine->request_list, link)
> >+            list_for_each_entry(request, &engine->request_list, link) {
> >+                    client->wq_rsvd += sizeof(struct guc_wq_item);
> 
> Presumably this is being called in some context that ensures that we
> don't need to hold the spinlock while updating wq_rsvd ? Maybe that
> should be mentioned ...

Obvious.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 14/18] drm/i915/guc: Prepare for nonblocking execbuf submission

Reply via email to