On 27/07/16 11:53, Chris Wilson wrote:
Rather than passing a complete set of GPU cache domains for either
invalidation or for flushing, or even both, just pass a single parameter
to the engine->emit_flush to determine the required operations.

engine->emit_flush(GPU, 0) -> engine->emit_flush(EMIT_INVALIDATE)
engine->emit_flush(0, GPU) -> engine->emit_flush(EMIT_FLUSH)
engine->emit_flush(GPU, GPU) -> engine->emit_flush(EMIT_FLUSH | EMIT_INVALIDATE)

This allows us to extend the behaviour easily in future, for example if
we want just a command barrier without the overhead of flushing.

Signed-off-by: Chris Wilson <[email protected]>
Cc: Dave Gordon <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
---
 drivers/gpu/drm/i915/i915_gem_context.c    |  2 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 10 ++----
 drivers/gpu/drm/i915/i915_gem_request.c    |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c           | 23 +++++-------
 drivers/gpu/drm/i915/intel_ringbuffer.c    | 57 +++++++++++-------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h    |  6 ++--
 7 files changed, 38 insertions(+), 64 deletions(-)

[snip]

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 39fa9eb10514..671b1cab5e54 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1666,8 +1666,7 @@ static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
        int ret;

        /* NB: TLBs must be flushed and invalidated before a switch */
-       ret = engine->emit_flush(req,
-                                I915_GEM_GPU_DOMAINS, I915_GEM_GPU_DOMAINS);
+       ret = engine->emit_flush(req, EMIT_INVALIDATE | EMIT_FLUSH);

The example in the commit message says "EMIT_FLUSH | EMIT_INVALIDATE"
which I think looks much nicer. Flush *after* invalidate would be pretty meaningless!

For even more syntactic sugar, we could choose "FLUSH + INVALIDATE" as + and | are equivalent for disjoint bitfields.

 int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request 
*request)
@@ -998,9 +998,7 @@ static int intel_logical_ring_workarounds_emit(struct 
drm_i915_gem_request *req)
        if (w->count == 0)
                return 0;

-       ret = req->engine->emit_flush(req,
-                                     I915_GEM_GPU_DOMAINS,
-                                     I915_GEM_GPU_DOMAINS);
+       ret = req->engine->emit_flush(req, EMIT_BARRIER);
        if (ret)
                return ret;

Distinguishing flush-and-invalidate from BARRIER seems like a good idea, because here we really don't want to flush or invalidate the GPU's caches; we really only want some sort of synchronisation of memory activity before changing the register. Therefore ...

[snip]

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 00723401f98c..76d0495943c3 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -292,8 +292,10 @@ struct intel_engine_cs {
        u32 ctx_desc_template;
        int             (*emit_request)(struct drm_i915_gem_request *request);
        int             (*emit_flush)(struct drm_i915_gem_request *request,
-                                     u32 invalidate_domains,
-                                     u32 flush_domains);
+                                     u32 mode);
+#define EMIT_INVALIDATE        BIT(0)
+#define EMIT_FLUSH     BIT(1)
+#define EMIT_BARRIER   (EMIT_INVALIDATE | EMIT_FLUSH)

... I think these should be three (or maybe four) distinct operations, and collapsing BARRIER to FLUSH+INVALIDATE if the h/w can't do a simple barrier should be left to the gen-specific code. Also what about allowing for a FLUSH or INVALIDATE without BARRIER semantics? Do we care about whether the command streamer stalls before/after these operations?

So maybe start with

#define EMIT_CS_STALL   BIT(0)  /* Stall CS before operation    */
#define EMIT_WRITEBACK  BIT(1)  /* Ensure caches written back   */
#define EMIT_DROP       BIT(2)  /* Drop contents of cache(s)    */
#define EMIT_POSTSYNC   BIT(3)  /* Wait until complete          */

and then

#define EMIT_BARRIER    (EMIT_CS_STALL+EMIT_POSTSYNC)
#define EMIT_FLUSH      (EMIT_CS_STALL+EMIT_WRITEBACK)
#define EMIT_INVALIDATE (EMIT_DROP+EMIT_POSTSYNC)
#define EMIT_SWITCH     (EMIT_FLUSH+EMIT_INVALIDATE)

or whatever combinations of basic operations are actually useful.

.Dave.
_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to