Hi Richard,

Currently for pr96912, we end up with:

W foo (W x, W y, V m)
{
  W t;
  vector(16) <signed-boolean:8> _1;
  vector(16) signed char _2;
  W _7;
  vector(2) long long int _9;
  vector(2) long long int _10;

  <bb 2> [local count: 1073741824]:
  _1 = m_3(D) < { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  _2 = VIEW_CONVERT_EXPR<vector(16) signed char>(_1);
  t_4 = VIEW_CONVERT_EXPR<W>(_2);
  _9 = x_5(D) ^ y_6(D);
  _10 = t_4 & _9;
  _7 = x_5(D) ^ _10;
  return _7;
}

The mask mode is V16QI and the data mode is V2DI (please forgive my RTL 
terminology).
The assignment t_4 view-converts the mask into the "data type" for the bitwise
operations.  The use x86's pblendvb, the "vcond_expr" operation needs to be 
mask's
mode (V16QI) rather than the data's mode (V2DI).  Hence the unsigned_type_for 
of the truth_type_for [if they have different NUNITS].  Obviously, converting 
the
mask to V2DI and performing a V2DI blend, won't produce the same result.

The most useful clause of vector_mask_p is actually the VECTOR_BOOLEAN_ TYPE_P
test that catches all "mask types", such as those that result from vector 
comparisons.
Note that vector_mask_p is tested against the operand of a view_convert expr.
The remaining cases handle truth_*_expr like operations on those comparisons.
One follow-up patch is to additionally allow VIEW_CONVERT_EXPR if both source
and destination are of  VECTOR_TYPE_P and known_eq TYPE_VECTOR_SUBUNITS.
Likewise, a C cst_vector_mask_p could check each element rather than the catch
all "integer_zerop || integer_all_onesp".

I agree with gimple-isel replacing VEC_COND_EXPR when it's supported in 
hardware,
just like .FMA is inserted to replace the universal MULT_EXPR/PLUS_EXPR tree 
codes,
the question is whether vec_cond_expr (and vec_duplicate) can always be expanded
moderately efficiently by the middle-end.  For one thing, we have vector cost 
models
that should be able to  distinguish between efficient and inefficient 
implementations.
And a vec_duplicate expander for SPARC should be able to generate an optimal 
sequence of instructions, even if there's isn't native hardware (instruction) 
support.
For example, scalar multiplication by 0x0101010101010101 may be a good way to 
vec_duplicate QI mode to V8QI mode (via DI mode), at least with -Os.  As you've
mentioned, the VEC_PERM infrastructure should be useful here.

[p.s. it's unfortunate that some of my patches appear controversial.  By 
randomly
selecting Bugzilla (regression) PRs that have been open for a long time, I seem 
to be
selecting/enriching for bugs for which there's no simple solution, and that 
maintainers
have already thought about for a long time but without coming up a satisfactory 
solution].

Thanks again for your assistance/direction.
Cheers,
Roger
--

> -----Original Message-----
> From: Richard Biener <richard.guent...@gmail.com>
> Sent: 23 May 2022 14:36
> To: Roger Sayle <ro...@nextmovesoftware.com>
> Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> Subject: Re: [PATCH/RFC] PR tree-optimization/96912: Recognize
> VEC_COND_EXPR in match.pd
> 
> On Mon, May 23, 2022 at 3:06 PM Roger Sayle
> <ro...@nextmovesoftware.com> wrote:
> >
> >
> > Hi Richard,
> > I was wondering what you think of the following patch as a solution to
> > PR tree-optimization/96912, i.e. the ability to recognize pblendvb
> > from regular code rather than as a target specific builtin?
> >
> > The obvious point of contention is that the current middle-end
> > philosophy around vector expressions is that the middle-end should
> > continually check for backend support, whereas I prefer the "old
> > school" view that trees are an abstraction and that RTL expansion is
> > the point where these abstract operations get converted/lowered into
> instructions supported by the target.
> 
> That's still true for trees aka GENERIC but GIMPLE is more like RTL here.
> Note before vector lowering GIMPLE is like GENERIC but after the "RTL
> expansion" of vectors is considered done.  I _think_ this was done to get the
> opportunity to optimize the lowered code (and maybe cheat by doing that
> lowering sub-optimally) given that user-written "generic"
> vector code tends to run into limitations.
> 
> > [The exceptions being built-in functions, IFN_* etc.] Should tree.texi
> > document which tree codes can't be used without checking the backend.
> 
> I suppose yes, but it's not really "which tree codes" but which types.  Even 
> for
> PLUS_EXPR you have to check for target support when vector types are
> involved.
> 
> Note when being too lazy before vector lowering you could end up transforming
> previously supported IL into unsupported and thus triggering vector lowering 
> to
> perform elementwise operations, severly slowing down code which is why you
> might find checks like if (target supports new code || target didn't support 
> old
> code)
> 
> > Bootstrapped and regression tested, but this obviously depends upon
> > RTL expansion being able to perform the inverse operation/lowering if
> required.
> 
> So the case in question "historically" was a task for RTL combine.  If we now
> bring that to GIMPLE we should indeed verify if the target can efficiently(!) 
> do
> the operation we like to use.  In this particular case it would be 
> vec_cond_mask
> support for the created VEC_COND_EXPR.
> We also have to avoid doing this after ISEL.
> 
> Note all original types are data types while you need a mask type for the
> selector which in turn means you will almost never match unless you hit the
> 
> +(match vector_mask_p
> + VECTOR_CST@0
> + (if (integer_zerop (@0) || integer_all_onesp (@0))))
> 
> case?
> 
> +(simplify
> + (bit_xor:c (bit_and:c (bit_xor:c @0 @1) (view_convert
> +vector_mask_p@2)) @0)  (if (VECTOR_TYPE_P (type)
> +      && VECTOR_TYPE_P (TREE_TYPE (@2)))
> +  (with { tree masktype = truth_type_for (TREE_TYPE (@2));
> 
> I think you want to check the truth_type_for (type) instead, check that you 
> can
> V_C_E @2 to it by checking it's a vector mode and the same as the truth_type
> mode.
> 
> +          tree vecttype = maybe_ne (TYPE_VECTOR_SUBPARTS (masktype),
> +                                   TYPE_VECTOR_SUBPARTS (type))
> +                         ? unsigned_type_for (masktype)
> +                         : type; }
> +   (view_convert (vec_cond:vecttype (view_convert:masktype @2)
> +                                   (view_convert:vecttype @1)
> +                                   (view_convert:vecttype @0))))))
> 
> and then have
> 
>     (vec_cond (view_convert:masktype @2) @1 @0)
> 
> Richard.
> 
> >
> > 2022-05-23  Roger Sayle  <ro...@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         PR tree-optimization/96912
> >         * match.pd (vector_mask_p): New predicate to identify vectors
> >         where every element must be zero or all ones.
> >         (bit_xor (bit_and (bit_xor ...) ...) ...): Recognize a VEC_COND_EXPR
> >         expressed as logical vector operations.
> >
> > gcc/testsuite/ChangeLog
> >         PR tree-optimization/96912
> >         * gcc.target/i386/pr96912.c: New test case.
> >
> >
> > Thoughts?  How would you solve this PR?  Are there convenience
> > predicates for testing whether a target supports vec_cond_expr,
> vec_duplicate, etc?
> >
> > Cheers,
> > Roger
> > --
> >

Reply via email to