On Mon, May 23, 2022 at 4:27 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> Hi Richard,
>
> Currently for pr96912, we end up with:
>
> W foo (W x, W y, V m)
> {
>   W t;
>   vector(16) <signed-boolean:8> _1;
>   vector(16) signed char _2;
>   W _7;
>   vector(2) long long int _9;
>   vector(2) long long int _10;
>
>   <bb 2> [local count: 1073741824]:
>   _1 = m_3(D) < { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
>   _2 = VIEW_CONVERT_EXPR<vector(16) signed char>(_1);
>   t_4 = VIEW_CONVERT_EXPR<W>(_2);
>   _9 = x_5(D) ^ y_6(D);
>   _10 = t_4 & _9;
>   _7 = x_5(D) ^ _10;
>   return _7;
> }
>
> The mask mode is V16QI and the data mode is V2DI (please forgive my RTL 
> terminology).
> The assignment t_4 view-converts the mask into the "data type" for the bitwise
> operations.  The use x86's pblendvb, the "vcond_expr" operation needs to be 
> mask's
> mode (V16QI) rather than the data's mode (V2DI).  Hence the unsigned_type_for
> of the truth_type_for [if they have different NUNITS].  Obviously, converting 
> the
> mask to V2DI and performing a V2DI blend, won't produce the same result.

OK, so that's because pblendvb doesn't produce the bit operation
result but instead
does element-wise selection.  So you rely on the all-zero or all-ones
constant matching
or the VECTOR_BOOLEAN_TYPE_P guarantee that this is a suitable mask to get
at the "element" granularity that's usable since from the data modes
in the bit operations
there's no way to guess that (which might also be the reason why it's hard to do
this on the RTL level).  Commentary above the pattern should explain
this I think.

The recursive (and exponential via bit-ops) matching of this is a bit
awkward, since
you match (view_convert vector_mask_p@2) isn't it enough to require a
VECTOR_BOOLEAN_TYPE_P here?  That should then eventually be good enough
to simply use TREE_TYPE (@2) as data type without further checking and thus

 (view_convert (vec_cond @2 (view_convert:mask_type @1)
(view_convert:mask_type @0))

>
> The most useful clause of vector_mask_p is actually the VECTOR_BOOLEAN_ TYPE_P
> test that catches all "mask types", such as those that result from vector 
> comparisons.
> Note that vector_mask_p is tested against the operand of a view_convert expr.
> The remaining cases handle truth_*_expr like operations on those comparisons.

But those should be still VECTOR_BOOLEAN_TYPE_P?

> One follow-up patch is to additionally allow VIEW_CONVERT_EXPR if both source
> and destination are of  VECTOR_TYPE_P and known_eq TYPE_VECTOR_SUBUNITS.
> Likewise, a C cst_vector_mask_p could check each element rather than the catch
> all "integer_zerop || integer_all_onesp".

Yes, you'd want to know whether the constant bit pattern covers a
vector mode (and which)
making lanes all ones or zeros.

> I agree with gimple-isel replacing VEC_COND_EXPR when it's supported in 
> hardware,
> just like .FMA is inserted to replace the universal MULT_EXPR/PLUS_EXPR tree 
> codes,
> the question is whether vec_cond_expr (and vec_duplicate) can always be 
> expanded
> moderately efficiently by the middle-end.  For one thing, we have vector cost 
> models
> that should be able to  distinguish between efficient and inefficient 
> implementations.
> And a vec_duplicate expander for SPARC should be able to generate an optimal
> sequence of instructions, even if there's isn't native hardware (instruction) 
> support.
> For example, scalar multiplication by 0x0101010101010101 may be a good way to
> vec_duplicate QI mode to V8QI mode (via DI mode), at least with -Os.  As 
> you've
> mentioned, the VEC_PERM infrastructure should be useful here.

If it can be done efficiently with alternate code sequences then the
current setup
is so that vector lowering would be the place that does this.  There's
somewhat of
a consensus that we want to move away from RTL expansion doing all the "clever"
things toward doing that on (late) GIMPLE so that RTL expansion can eventually
rely on optabs only.

I'm not saying that's going to win, likewise the current vector lowering is both
sub-optimal (relying on followup optimization) and placed too early
(due to this).

In theory vector match.pd patterns could condition themselves to
optimize_vectors_before_lowering_p (), alternatively they have to check
target support.  There's now plenty passes between vectorization and
vector lowering though and we probably should not mess with the vectorizer
produced code (which is supported by the target) so we do not run into
the sub-optimal vector lowering implementation.  Possibly vector lowering
should be done earlier, before loop opts (before PRE).

> [p.s. it's unfortunate that some of my patches appear controversial.  By 
> randomly
> selecting Bugzilla (regression) PRs that have been open for a long time, I 
> seem to be
> selecting/enriching for bugs for which there's no simple solution, and that 
> maintainers
> have already thought about for a long time but without coming up a 
> satisfactory solution].

You are simply hitting areas where GCC isn't nicely designed and the obviously
optimal solution isn't clear :/

> Thanks again for your assistance/direction.
> Cheers,
> Roger
> --
>
> > -----Original Message-----
> > From: Richard Biener <richard.guent...@gmail.com>
> > Sent: 23 May 2022 14:36
> > To: Roger Sayle <ro...@nextmovesoftware.com>
> > Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> > Subject: Re: [PATCH/RFC] PR tree-optimization/96912: Recognize
> > VEC_COND_EXPR in match.pd
> >
> > On Mon, May 23, 2022 at 3:06 PM Roger Sayle
> > <ro...@nextmovesoftware.com> wrote:
> > >
> > >
> > > Hi Richard,
> > > I was wondering what you think of the following patch as a solution to
> > > PR tree-optimization/96912, i.e. the ability to recognize pblendvb
> > > from regular code rather than as a target specific builtin?
> > >
> > > The obvious point of contention is that the current middle-end
> > > philosophy around vector expressions is that the middle-end should
> > > continually check for backend support, whereas I prefer the "old
> > > school" view that trees are an abstraction and that RTL expansion is
> > > the point where these abstract operations get converted/lowered into
> > instructions supported by the target.
> >
> > That's still true for trees aka GENERIC but GIMPLE is more like RTL here.
> > Note before vector lowering GIMPLE is like GENERIC but after the "RTL
> > expansion" of vectors is considered done.  I _think_ this was done to get 
> > the
> > opportunity to optimize the lowered code (and maybe cheat by doing that
> > lowering sub-optimally) given that user-written "generic"
> > vector code tends to run into limitations.
> >
> > > [The exceptions being built-in functions, IFN_* etc.] Should tree.texi
> > > document which tree codes can't be used without checking the backend.
> >
> > I suppose yes, but it's not really "which tree codes" but which types.  
> > Even for
> > PLUS_EXPR you have to check for target support when vector types are
> > involved.
> >
> > Note when being too lazy before vector lowering you could end up 
> > transforming
> > previously supported IL into unsupported and thus triggering vector 
> > lowering to
> > perform elementwise operations, severly slowing down code which is why you
> > might find checks like if (target supports new code || target didn't 
> > support old
> > code)
> >
> > > Bootstrapped and regression tested, but this obviously depends upon
> > > RTL expansion being able to perform the inverse operation/lowering if
> > required.
> >
> > So the case in question "historically" was a task for RTL combine.  If we 
> > now
> > bring that to GIMPLE we should indeed verify if the target can 
> > efficiently(!) do
> > the operation we like to use.  In this particular case it would be 
> > vec_cond_mask
> > support for the created VEC_COND_EXPR.
> > We also have to avoid doing this after ISEL.
> >
> > Note all original types are data types while you need a mask type for the
> > selector which in turn means you will almost never match unless you hit the
> >
> > +(match vector_mask_p
> > + VECTOR_CST@0
> > + (if (integer_zerop (@0) || integer_all_onesp (@0))))
> >
> > case?
> >
> > +(simplify
> > + (bit_xor:c (bit_and:c (bit_xor:c @0 @1) (view_convert
> > +vector_mask_p@2)) @0)  (if (VECTOR_TYPE_P (type)
> > +      && VECTOR_TYPE_P (TREE_TYPE (@2)))
> > +  (with { tree masktype = truth_type_for (TREE_TYPE (@2));
> >
> > I think you want to check the truth_type_for (type) instead, check that you 
> > can
> > V_C_E @2 to it by checking it's a vector mode and the same as the truth_type
> > mode.
> >
> > +          tree vecttype = maybe_ne (TYPE_VECTOR_SUBPARTS (masktype),
> > +                                   TYPE_VECTOR_SUBPARTS (type))
> > +                         ? unsigned_type_for (masktype)
> > +                         : type; }
> > +   (view_convert (vec_cond:vecttype (view_convert:masktype @2)
> > +                                   (view_convert:vecttype @1)
> > +                                   (view_convert:vecttype @0))))))
> >
> > and then have
> >
> >     (vec_cond (view_convert:masktype @2) @1 @0)
> >
> > Richard.
> >
> > >
> > > 2022-05-23  Roger Sayle  <ro...@nextmovesoftware.com>
> > >
> > > gcc/ChangeLog
> > >         PR tree-optimization/96912
> > >         * match.pd (vector_mask_p): New predicate to identify vectors
> > >         where every element must be zero or all ones.
> > >         (bit_xor (bit_and (bit_xor ...) ...) ...): Recognize a 
> > > VEC_COND_EXPR
> > >         expressed as logical vector operations.
> > >
> > > gcc/testsuite/ChangeLog
> > >         PR tree-optimization/96912
> > >         * gcc.target/i386/pr96912.c: New test case.
> > >
> > >
> > > Thoughts?  How would you solve this PR?  Are there convenience
> > > predicates for testing whether a target supports vec_cond_expr,
> > vec_duplicate, etc?
> > >
> > > Cheers,
> > > Roger
> > > --
> > >
>

Reply via email to