On Tue, Jan 16, 2018 at 12:02:43PM -0800, Nanley Chery wrote:
> On Sat, Jan 13, 2018 at 11:11:35AM -0800, Jason Ekstrand wrote:
> > Sorry for all the list spam, but I'm sort of thinking out-loud and writing
> > it on the list for all to read.
> >
> > I'm thinking that what we want this list to return is not a bool but an enum
> >
> > /* The ordering of this enum is important */
> > enum anv_fast_clear_support {
> > ANV_FAST_CLEAR_NONE = 0,
> > ANV_FAST_CLEAR_ZERO_ONLY = 1,
> > ANV_FAST_CLEAR_ANY = 2,
> > };
> >
> > And then the predicate for whether or not to do the resolve becomes
> > (has_compression && !supports_compression) || (image_fast_clear >
> > supported_fast_clear). I still haven't quite figured out what to do with
> > MI_PREDICATE to make this work out. I'm sure it's possible with MI math
> > but maybe I can come up with something clever that doesn't require that.
> > In the worst case, we only have to deal with this complexity on gen9+ where
> > we have MI_MATH so it'll be ok.
> >
> > Does this seem like a reasonable plan?
> >
>
> Thanks for asking about benchmark numbers. I checked the Sascha Willems
> demos and an old build of Dota 2 and neither app used this optimization.
> I haven't checked any other Vulkan application, but given that it's the
> GENERAL layout I think we'd do well to drop the optimization and avoid
> the additional complexity. Is that cool with you?
>
I just realized that this doesn't solve all the issues (see below).
> I checked the git logs to learn more about why the driver does this. It
> looks like fast-clearing with any color was supported in the GENERAL
> layout back when we resolved at the end of the render pass. When porting
> the code to use layout transitions, I tried to mitigate the possible
> regressions with dcff5ab9f164afbc29c051b18990a377bb46e4bc.
>
> -Nanley
>
> > On Fri, Jan 12, 2018 at 5:23 PM, Jason Ekstrand <[email protected]>
> > wrote:
> >
> > > I made a table to help visualize all the different cases:
> > >
> > > Layout | compression | zero clear | non-zero clear
> > > ============================+=============+============+================
> > > GENERAL | N | N | N
> > > ----------------------------+-------------+------------+----------------
> > > COLOR_ATTACHMENT | Y | Y | Y
> > > ----------------------------+-------------+------------+----------------
> > > SHADER_READ_ONLY | Y | Y | N
> > > ----------------------------+-------------+------------+----------------
> > > GENERAL (RT/tex/blit only) | Y | Y | N
> > > ----------------------------+-------------+------------+----------------
> > > PRESENT_SRC (Y_TILED) | N | N | N
> > > ----------------------------+-------------+------------+----------------
> > > PRESENT_SRC (Y_TILED_CCS) | Y | N | N
> > > ----------------------------+-------------+------------+----------------
> > >
This looks good to me. Could the last entry support zero clear?
> > > We will need to think about what to do with this. Having non-zero clears
> > > be different from zero clears is tricky. We may need to have even more
> > > bits in our aux tracking. :-/
> > >
> > > On January 12, 2018 16:05:12 Jason Ekstrand <[email protected]> wrote:
> > >
> > >> On Wed, Dec 13, 2017 at 11:26 AM, Nanley Chery <[email protected]>
> > >> wrote:
> > >>
> > >>> On Mon, Nov 27, 2017 at 07:05:56PM -0800, Jason Ekstrand wrote:
> > >>> > ---
> > >>> > src/intel/vulkan/anv_image.c | 58 ++++++++++++++++++++++++++++++
> > >>> ++++++++++++
> > >>> > src/intel/vulkan/anv_private.h | 5 ++++
> > >>> > 2 files changed, 63 insertions(+)
> > >>> >
> > >>> > diff --git a/src/intel/vulkan/anv_image.c
> > >>> b/src/intel/vulkan/anv_image..c
> > >>>
> > >>> > index a872149..561da28 100644
> > >>> > --- a/src/intel/vulkan/anv_image.c
> > >>> > +++ b/src/intel/vulkan/anv_image.c
> > >>> > @@ -837,6 +837,64 @@ anv_layout_to_aux_usage(const struct
> > >>> gen_device_info * const devinfo,
> > >>> > unreachable("layout is not a VkImageLayout enumeration member.");
> > >>> > }
> > >>> >
> > >>> > +/**
> > >>> > + * This function returns true if the given image in the given
> > >>> VkImageLayout
> > >>> > + * supports unresolved fast-clears.
> > >>> > + *
> > >>> > + * @param devinfo The device information of the Intel GPU.
> > >>> > + * @param image The image that may contain a collection of buffers.
> > >>> > + * @param aspect The aspect of the image to be accessed.
> > >>> > + * @param layout The current layout of the image aspect(s).
> > >>> > + */
> > >>> > +bool
> > >>> > +anv_layout_supports_fast_clear(const struct gen_device_info * const
> > >>> devinfo,
> > >>> > + const struct anv_image * const image,
> > >>> > + const VkImageAspectFlagBits aspect,
> > >>> > + const VkImageLayout layout)
> > >>> > +{
> > >>> > + /* The aspect must be exactly one of the image aspects. */
> > >>> > + assert(_mesa_bitcount(aspect) == 1 && (aspect & image->aspects));
> > >>> > +
> > >>> > + uint32_t plane = anv_image_aspect_to_plane(image->aspects,
> > >>> aspect);
> > >>> > +
> > >>> > + /* If there is no auxiliary surface allocated, there are no
> > >>> fast-clears */
> > >>> > + if (image->planes[plane].aux_surface.isl.size == 0)
> > >>> > + return false;
> > >>> > +
> > >>> > + /* All images that use an auxiliary surface are required to be
> > >>> tiled. */
> > >>> > + assert(image->tiling == VK_IMAGE_TILING_OPTIMAL);
> > >>> > +
> > >>> > + /* Stencil has no aux */
> > >>> > + assert(aspect != VK_IMAGE_ASPECT_STENCIL_BIT);
> > >>> > +
> > >>> > + if (aspect == VK_IMAGE_ASPECT_DEPTH_BIT) {
> > >>> > + /* For depth images (with HiZ), the layout supports fast-clears
> > >>> if and
> > >>> > + * only if it supports HiZ.
> > >>> > + */
> > >>> > + enum isl_aux_usage aux_usage =
> > >>> > + anv_layout_to_aux_usage(devinfo, image, aspect, layout);
> > >>> > + return aux_usage == ISL_AUX_USAGE_HIZ;
> > >>> > + }
> > >>> > +
> > >>> > + assert(image->aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_ANV);
> > >>> > +
> > >>> > + /* Multisample fast-clear is not yet supported. */
> > >>> > + if (image->samples > 1)
> > >>> > + return false;
> > >>> > +
> > >>> > + /* The only layout which actually supports fast-clears today is
> > >>> > + * VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL. Some day in the
> > >>> future
> > >>> > + * this may change if our ability to track clear colors improves.
> > >>> > + */
> > >>> > + switch (layout) {
> > >>> > + case VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL:
> > >>> > + return true;
> > >>> > +
> > >>>
> > >>> This is kind of tricky. We actually allow fast-clears for CCS_E textures
> > >>> in the GENERAL layout if the clear color is all zeros. When this
> > >>> happens, we set the needs_resolve predicate to false. This means that
> > >>> unresolved fast-clears is potentially in use for CCS_E images in any
> > >>> layout.
> > >>>
> > >>
> > >> Hrm... This is going to get interesting. This works but only if we
> > >> assume that the thing using it in the general layout is rendering or
> > >> texturing. If it's storage, this doesn't work. That said, storage
> > >> images
> > >> are CCS_D-only so that's not a problem now. The bigger problem is that
> > >> this means the resolve won't happen for window-system images and that's
> > >> bad. I'm not sure what the right path forward is; I'll have to think on
> > >> it.
> > >>
> > >> Do we have any benchmark numbers to justify this clever trick?
> > >>
> > >>
I just realized that we have the same problem with CCS_E-enabled images
in the COLOR_ATTACHMENT_OPTIMAL layout. We set the needs_resolve bit
to false when such images are fast-cleared transparent black. Thus any
layout of a CCS_E-enabled image may have an unresolved fast-clear.
Quick testing shows that this gives an ~13% FPS increase in the Sascha
Willems deferred demo. The following other demos use this but provide
no visible change in performance: debugmarker, deferredshadows, hdr,
offscreen, radialblur, subpasses. One image in Dota 2 makes use of this,
but I didn't measure the performance impact (could be nil).
-Nanley
> > >>> -Nanley
> > >>>
> > >>> > + default:
> > >>> > + return false;
> > >>> > + }
> > >>> > +}
> > >>> > +
> > >>> >
> > >>> > static struct anv_state
> > >>> > alloc_surface_state(struct anv_device *device)
> > >>> > diff --git a/src/intel/vulkan/anv_private.h
> > >>> b/src/intel/vulkan/anv_private.h
> > >>> > index 5dd95a3..461bfed 100644
> > >>> > --- a/src/intel/vulkan/anv_private.h
> > >>> > +++ b/src/intel/vulkan/anv_private.h
> > >>> > @@ -2559,6 +2559,11 @@ anv_layout_to_aux_usage(const struct
> > >>> gen_device_info * const devinfo,
> > >>> > const struct anv_image *image,
> > >>> > const VkImageAspectFlagBits aspect,
> > >>> > const VkImageLayout layout);
> > >>> > +bool
> > >>> > +anv_layout_supports_fast_clear(const struct gen_device_info * const
> > >>> devinfo,
> > >>> > + const struct anv_image * const image,
> > >>> > + const VkImageAspectFlagBits aspect,
> > >>> > + const VkImageLayout layout);
> > >>> >
> > >>> > /* This is defined as a macro so that it works for both
> > >>> > * VkImageSubresourceRange and VkImageSubresourceLayers
> > >>> > --
> > >>> > 2.5.0.400.gff86faf
> > >>> >
> > >>> > _______________________________________________
> > >>> > mesa-dev mailing list
> > >>> > [email protected]
> > >>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > >>>
> > >>
> > >>
_______________________________________________
mesa-dev mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/mesa-dev