Samuel Iglesias Gonsálvez <[email protected]> writes: > From: Iago Toral Quiroga <[email protected]> > > There will be a few places where we need to shuffle the result of a 32-bit > load into valid 64-bit data, so extract this logic into a separate helper > that we can reuse. > > The shuffling needs to operate with WE_all set because we are changing the > layout of the data across the various channels. Otherwise we will run into > problems in non-uniform control-flow scenarios. > I guess you could remove this paragraph because it no longer applies to the last version of the patch.
> v2 (Curro):
> - Use subscript() instead of stride()
> - Assert on the input types rather than retyping.
> - Use offset() instead of horiz_offset(), drop the multiplier definition.
> - Do not use a temporary for the writes and drop force_writemask_all.
Don't pretend you took my "don't use a temporary" suggestion into
account. :P
> - Mark component_i as const.
Did you forget to git-add something?
> - Make the function name lower case.
> ---
> src/mesa/drivers/dri/i965/brw_fs.cpp | 53
> ++++++++++++++++++++++++++++++++++++
> src/mesa/drivers/dri/i965/brw_fs.h | 5 ++++
> 2 files changed, 58 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 15d5759..4827dea 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -212,6 +212,59 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder
> &bld,
> }
>
> /**
> + * This helper takes the result of a load operation that reads 32-bit
> elements
> + * in this format:
> + *
> + * x x x x x x x x
> + * y y y y y y y y
> + * z z z z z z z z
> + * w w w w w w w w
> + *
> + * and shuffles the data to get this:
> + *
> + * x y x y x y x y
> + * x y x y x y x y
> + * z w z w z w z w
> + * z w z w z w z w
> + *
> + * Which is exactly what we want if the load is reading 64-bit components
> + * like doubles, where x represents the low 32-bit of the x double component
> + * and y represents the high 32-bit of the x double component (likewise with
> + * z and w for double component y). The parameter @components represents
> + * the number of 64-bit components present in @src. This would typically be
> + * 2 at most, since we can only fit 2 double elements in the result of a
> + * vec4 load.
> + *
> + * Notice that @dst and @src can be the same register.
> + */
> +void
> +fs_visitor::shuffle_32bit_load_result_to_64bit_data(const fs_builder &bld,
There was a second reason I had in mind when I suggested it would
improve encapsulation to take this out of fs_visitor: The function has
absolutely nothing to do with visiting, it uses no internal or external
fs_visitor data structures or interfaces, it doesn't even use the "this"
pointer. Defining a function that could perfectly be stand-alone inside
an object it has no need to be in (in this case it doesn't even have any
logical relation with) actually *decreases* encapsulation because it
exposes the object's internals to the function unnecessarily.
Either way the back-end code is already plagued by this anti-pattern so
I'm not going to complain if you keep the code as-is -- You could argue
you're just being consistent with the existing practice. ;)
> + const fs_reg dst,
> + const fs_reg src,
Pass by reference.
> + uint32_t components)
> +{
> + assert(type_sz(src.type) == 4);
> + assert(type_sz(dst.type) == 8);
> +
> + /* A temporary that we will use to shuffle the 32-bit data of each
> + * component in the vector into valid 64-bit data. We can't write directly
> + * to dst because dst can be (and would usually be) the same as src
> + * and in that case the first MOV in the loop below would overwrite the
> + * data read in the second MOV.
> + */
> + fs_reg tmp = bld.vgrf(dst.type);
> +
> + for (unsigned i = 0; i < components; i++) {
> + fs_reg component_i = offset(src, bld, 2 * i);
> +
> + bld.MOV(subscript(tmp, src.type, 0), component_i);
> + bld.MOV(subscript(tmp, src.type, 1), offset(component_i, bld, 1));
> +
> + bld.MOV(offset(dst, bld, i), tmp);
Ah, yes, this looks much better,
Reviewed-by: Francisco Jerez <[email protected]>
> + }
> +}
> +
> +/**
> * A helper for MOV generation for fixing up broken hardware SEND dependency
> * handling.
> */
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h
> b/src/mesa/drivers/dri/i965/brw_fs.h
> index b906f3d..1970ad0 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> @@ -105,6 +105,11 @@ public:
> uint32_t const_offset);
> void DEP_RESOLVE_MOV(const brw::fs_builder &bld, int grf);
>
> + void shuffle_32bit_load_result_to_64bit_data(const brw::fs_builder &bld,
> + const fs_reg dst,
> + const fs_reg src,
> + uint32_t components);
> +
> bool run_fs(bool do_rep_send);
> bool run_vs(gl_clip_plane *clip_planes);
> bool run_tcs_single_patch();
> --
> 2.5.0
>
> _______________________________________________
> mesa-dev mailing list
> [email protected]
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
