Richard Biener <rguent...@suse.de> writes:
> Hi,
>
> I'm trying to enforce SLP_TREE_VECTYPE being set (correctly) on
> external and invariant SLP nodes to avoid (re-)computing that
> in other places.

Nice.

> The responsible entity for specifying the
> desired vector type is vectorizable_* - vectorization analysis
> of the user (when we start to share those nodes between multiple
> users a user can then also unshare such a node if an existing
> vector type conflicts with its own requirements).
>
> The previous patch 
> https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545938.html
> set up things so that vect_slp_analyze_node_operations would
> use such type information during costing and also pick up
> unshared nodes.
>
> Now as you can see below where I started to enforce SLP_TREE_VECTYPE
> for the special boolean case in vect_get_constant_vectors I seemingly
> only had to fixup two places, vectorizable_operation (saw bool & bool)
> and vectorizable_comparison doing testing on x86_64 and aarch64
> (cross cc1 on aarch64-sve.exp and vect.exp).  There obviously will
> be more places and I was thinking of less ugly (and more correct?)
> ways than doing
>
> +         FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (slp_node), i, child)
> +           if (SLP_TREE_DEF_TYPE (child) != vect_internal_def)
> +             SLP_TREE_VECTYPE (child) = vectype;
>
> in each vectorizable_*.  In principle it is the vect_is_simple_use
> that would need adjustment to associate the correct SLP child with
> a vectype so I can't think of sth better than refactoring things
> to sth like
>
>   vect_simple_uses (vinfo, stmt_info, slp_node, &dts, &vectypes)
>
> and process all operans at once transparently for stmt_info/slp_node
> (where the actual vect_is_simple_use check is redundant for SLP
> nodes anyway).  The index into the array would then match up
> with the SLP child (if SLP) or the stmt operand (if not SLP - with
> SLP the order can be different on the stmt).
>
> Any better ideas or comments on the general direction?  Richard,
> which vectorizable_* do you remember being the "worst" with respect
> to vector type determining for invariants?

I think most of the struggles I had were from different code ending
up with different results when recomputing the vectype.  IIRC the
vectorizable_* routines usually got it right and it was the other
code that got it wrong.  So yeah, can imagine that we don't need
many changes if the vectoriable_* routines get the final say.

Not related to invariants specifically, but the routines I remember
being most awkward for vector type choices were vectorizable_condition
(because of the embedded comparison) and vectorizable_conversion (because
there could be multiple stages, with the intermediate vector types not
being recorded).

OTOH, I think calculating the ideal vector types requires
propagating the vector type in both directions, rather than
just upwards from the roots.  E.g. for:

  double d1, d2, d3, d4;
  float f1, f2, f3;

  f1 = d1 == d2 & d3 == d4 ? f2 : f3;

when operating on full vectors, there needs to be a conversion
somewhere between "d-booleans" and "f-booleans".  For this case
it probably should happen as early as possible, e.g.:

  b1_lo = d1_lo == d2_lo
  b1_hi = d1_hi == d2_hi

  b2_lo = d3_lo == d4_lo
  b2_hi = d3_hi == d4_hi

  b1 = pack (b1_lo, b1_hi)

  b2 = pack (b2_lo, b2_hi)

  b3 = b1 & b2

  f1 = b3 ? f2 : f3;

But if the roles are reversed:

  float f1, f2, f3, f4;
  double d1, d2, d3;

  d1 = f1 == f2 & f3 == f4 ? d2 : d3;

it should happen after the "&":

  b1 = f1 == f2

  b2 = f3 == f4

  b3 = b1 & b2

  b3_lo = unpack_lo (b3)
  b3_hi = unpack_hi (b3)

  d1_lo = b3_lo ? d2_lo : d3_lo;
  d1_hi = b3_hi ? d2_hi : d3_hi;

At the moment we rely on pattern matching for that.  But it basically
means that the pattern matcher has to "predict" what vector types the
vectorizable_* routines would calculate, so that it knows whether to
create pattern statements that enforce a different choice.

IMO it'd be really good to get rid of that and unify the vector
type calculation.  But that probably means not doing it during
vectorizable_*, or at least changing the way that the vectorizable_*
functions are called during analysis, so that information can flow in
both directions.

Thanks,
Richard

Reply via email to