On Tue, Jan 16, 2018 at 10:21:44PM -0800, Jason Ekstrand wrote:
> On Tue, Jan 16, 2018 at 4:30 PM, Nanley Chery <[email protected]> wrote:
> 
> > On Tue, Jan 16, 2018 at 12:02:43PM -0800, Nanley Chery wrote:
> > > On Sat, Jan 13, 2018 at 11:11:35AM -0800, Jason Ekstrand wrote:
> > > > Sorry for all the list spam, but I'm sort of thinking out-loud and
> > writing
> > > > it on the list for all to read.
> > > >
> > > > I'm thinking that what we want this list to return is not a bool but
> > an enum
> > > >
> > > > /* The ordering of this enum is important */
> > > > enum anv_fast_clear_support {
> > > >    ANV_FAST_CLEAR_NONE = 0,
> > > >    ANV_FAST_CLEAR_ZERO_ONLY = 1,
> > > >    ANV_FAST_CLEAR_ANY = 2,
> > > > };
> > > >
> > > > And then the predicate for whether or not to do the resolve becomes
> > > > (has_compression && !supports_compression) || (image_fast_clear >
> > > > supported_fast_clear).  I still haven't quite figured out what to do
> > with
> > > > MI_PREDICATE to make this work out.  I'm sure it's possible with MI
> > math
> > > > but maybe I can come up with something clever that doesn't require
> > that.
> > > > In the worst case, we only have to deal with this complexity on gen9+
> > where
> > > > we have MI_MATH so it'll be ok.
> > > >
> > > > Does this seem like a reasonable plan?
> > > >
> > >
> > > Thanks for asking about benchmark numbers. I checked the Sascha Willems
> > > demos and an old build of Dota 2 and neither app used this optimization.
> > > I haven't checked any other Vulkan application, but given that it's the
> > > GENERAL layout I think we'd do well to drop the optimization and avoid
> > > the additional complexity. Is that cool with you?
> > >
> >
> > I just realized that this doesn't solve all the issues (see below).
> >
> > > I checked the git logs to learn more about why the driver does this. It
> > > looks like fast-clearing with any color was supported in the GENERAL
> > > layout back when we resolved at the end of the render pass. When porting
> > > the code to use layout transitions, I tried to mitigate the possible
> > > regressions with dcff5ab9f164afbc29c051b18990a377bb46e4bc.
> > >
> > > -Nanley
> > >
> > > > On Fri, Jan 12, 2018 at 5:23 PM, Jason Ekstrand <[email protected]>
> > > > wrote:
> > > >
> > > > > I made a table to help visualize all the different cases:
> > > > >
> > > > > Layout                      | compression | zero clear | non-zero
> > clear
> > > > > ============================+=============+============+====
> > ============
> > > > > GENERAL                     |      N      |      N     |       N
> > > > > ----------------------------+-------------+------------+----
> > ------------
> > > > > COLOR_ATTACHMENT            |      Y      |      Y     |       Y
> > > > > ----------------------------+-------------+------------+----
> > ------------
> > > > > SHADER_READ_ONLY            |      Y      |      Y     |       N
> > > > > ----------------------------+-------------+------------+----
> > ------------
> > > > > GENERAL (RT/tex/blit only)  |      Y      |      Y     |       N
> > > > > ----------------------------+-------------+------------+----
> > ------------
> > > > > PRESENT_SRC (Y_TILED)       |      N      |      N     |       N
> > > > > ----------------------------+-------------+------------+----
> > ------------
> > > > > PRESENT_SRC (Y_TILED_CCS)   |      Y      |      N     |       N
> > > > > ----------------------------+-------------+------------+----
> > ------------
> > > > >
> >
> > This looks good to me. Could the last entry support zero clear?
> >
> 
> I think we could probably define it to as I think that's what scanout does
> when it sees a clear block.  However, there's no guarantee that mesa
> texturing from it will give you a zero clear color.  I think it will but
> that's more an artifact of history than intent.  The GL driver assumes that
> if something is in the COMPRESSED_NO_CLEAR state that there are exactly
> zero clear blocks.  We could adjust Y_TILED_CCS to mean COMPRESSED_CLEAR
> with a clear color of 0.
> 
> 

Thanks for testing this. It's too bad the scanout hardware doesn't
return the zero clear color.

> > > > > We will need to think about what to do with this.  Having non-zero
> > clears
> > > > > be different from zero clears is tricky.  We may need to have even
> > more
> > > > > bits in our aux tracking. :-/
> > > > >
> > > > > On January 12, 2018 16:05:12 Jason Ekstrand <[email protected]>
> > wrote:
> > > > >
> > > > >> On Wed, Dec 13, 2017 at 11:26 AM, Nanley Chery <
> > [email protected]>
> > > > >> wrote:
> > > > >>
> > > > >>> On Mon, Nov 27, 2017 at 07:05:56PM -0800, Jason Ekstrand wrote:
> > > > >>> > ---
> > > > >>> >  src/intel/vulkan/anv_image.c   | 58
> > ++++++++++++++++++++++++++++++
> > > > >>> ++++++++++++
> > > > >>> >  src/intel/vulkan/anv_private.h |  5 ++++
> > > > >>> >  2 files changed, 63 insertions(+)
> > > > >>> >
> > > > >>> > diff --git a/src/intel/vulkan/anv_image.c
> > > > >>> b/src/intel/vulkan/anv_image..c
> > > > >>>
> > > > >>> > index a872149..561da28 100644
> > > > >>> > --- a/src/intel/vulkan/anv_image.c
> > > > >>> > +++ b/src/intel/vulkan/anv_image.c
> > > > >>> > @@ -837,6 +837,64 @@ anv_layout_to_aux_usage(const struct
> > > > >>> gen_device_info * const devinfo,
> > > > >>> >     unreachable("layout is not a VkImageLayout enumeration
> > member.");
> > > > >>> >  }
> > > > >>> >
> > > > >>> > +/**
> > > > >>> > + * This function returns true if the given image in the given
> > > > >>> VkImageLayout
> > > > >>> > + * supports unresolved fast-clears.
> > > > >>> > + *
> > > > >>> > + * @param devinfo The device information of the Intel GPU.
> > > > >>> > + * @param image The image that may contain a collection of
> > buffers.
> > > > >>> > + * @param aspect The aspect of the image to be accessed.
> > > > >>> > + * @param layout The current layout of the image aspect(s).
> > > > >>> > + */
> > > > >>> > +bool
> > > > >>> > +anv_layout_supports_fast_clear(const struct gen_device_info *
> > const
> > > > >>> devinfo,
> > > > >>> > +                               const struct anv_image * const
> > image,
> > > > >>> > +                               const VkImageAspectFlagBits
> > aspect,
> > > > >>> > +                               const VkImageLayout layout)
> > > > >>> > +{
> > > > >>> > +   /* The aspect must be exactly one of the image aspects. */
> > > > >>> > +   assert(_mesa_bitcount(aspect) == 1 && (aspect &
> > image->aspects));
> > > > >>> > +
> > > > >>> > +   uint32_t plane = anv_image_aspect_to_plane(image->aspects,
> > > > >>> aspect);
> > > > >>> > +
> > > > >>> > +   /* If there is no auxiliary surface allocated, there are no
> > > > >>> fast-clears */
> > > > >>> > +   if (image->planes[plane].aux_surface.isl.size == 0)
> > > > >>> > +      return false;
> > > > >>> > +
> > > > >>> > +   /* All images that use an auxiliary surface are required to
> > be
> > > > >>> tiled. */
> > > > >>> > +   assert(image->tiling == VK_IMAGE_TILING_OPTIMAL);
> > > > >>> > +
> > > > >>> > +   /* Stencil has no aux */
> > > > >>> > +   assert(aspect != VK_IMAGE_ASPECT_STENCIL_BIT);
> > > > >>> > +
> > > > >>> > +   if (aspect == VK_IMAGE_ASPECT_DEPTH_BIT) {
> > > > >>> > +      /* For depth images (with HiZ), the layout supports
> > fast-clears
> > > > >>> if and
> > > > >>> > +       * only if it supports HiZ.
> > > > >>> > +       */
> > > > >>> > +      enum isl_aux_usage aux_usage =
> > > > >>> > +         anv_layout_to_aux_usage(devinfo, image, aspect,
> > layout);
> > > > >>> > +      return aux_usage == ISL_AUX_USAGE_HIZ;
> > > > >>> > +   }
> > > > >>> > +
> > > > >>> > +   assert(image->aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_ANV);
> > > > >>> > +
> > > > >>> > +   /* Multisample fast-clear is not yet supported. */
> > > > >>> > +   if (image->samples > 1)
> > > > >>> > +      return false;
> > > > >>> > +
> > > > >>> > +   /* The only layout which actually supports fast-clears today
> > is
> > > > >>> > +    * VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL.  Some day in
> > the
> > > > >>> future
> > > > >>> > +    * this may change if our ability to track clear colors
> > improves.
> > > > >>> > +    */
> > > > >>> > +   switch (layout) {
> > > > >>> > +   case VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL:
> > > > >>> > +      return true;
> > > > >>> > +
> > > > >>>
> > > > >>> This is kind of tricky. We actually allow fast-clears for CCS_E
> > textures
> > > > >>> in the GENERAL layout if the clear color is all zeros. When this
> > > > >>> happens, we set the needs_resolve predicate to false. This means
> > that
> > > > >>> unresolved fast-clears is potentially in use for CCS_E images in
> > any
> > > > >>> layout.
> > > > >>>
> > > > >>
> > > > >> Hrm...  This is going to get interesting.  This works but only if we
> > > > >> assume that the thing using it in the general layout is rendering or
> > > > >> texturing.  If it's storage, this doesn't work.  That said, storage
> > images
> > > > >> are CCS_D-only so that's not a problem now.  The bigger problem is
> > that
> > > > >> this means the resolve won't happen for window-system images and
> > that's
> > > > >> bad.  I'm not sure what the right path forward is; I'll have to
> > think on it.
> > > > >>
> > > > >> Do we have any benchmark numbers to justify this clever trick?
> > > > >>
> > > > >>
> >
> > I just realized that we have the same problem with CCS_E-enabled images
> > in the COLOR_ATTACHMENT_OPTIMAL layout. We set the needs_resolve bit
> > to false when such images are fast-cleared transparent black. Thus any
> > layout of a CCS_E-enabled image may have an unresolved fast-clear.
> >
> 
> I think those are effectively the same issue but maybe I'm misunderstanding
> something.
> 
> 

Yes, they're the same issue. I may have been overly verbose. Sorry about
that.

> > Quick testing shows that this gives an ~13% FPS increase in the Sascha
> > Willems deferred demo. The following other demos use this but provide
> > no visible change in performance: debugmarker, deferredshadows, hdr,
> > offscreen, radialblur, subpasses. One image in Dota 2 makes use of this,
> > but I didn't measure the performance impact (could be nil).
> >
> 
> That's unfortunate...  I'm not seeing a lot of other options off-hand than
> to just track zero vs. non-zero clear colors.
> 

Yeah, neither am I.

> On another note, does this image in the deferred demo happen to ever use
> STORE_OP_DONT_CARE?  If so, we may be able to whack the needs_resolve bit
> to off.
> 
> 

I looked at the source and the image doesn't seem to use STORE_OP_DONT_CARE.
_______________________________________________
mesa-dev mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to