Hi Richard,
Currently for pr96912, we end up with: W foo (W x, W y, V m) { W t; vector(16) <signed-boolean:8> _1; vector(16) signed char _2; W _7; vector(2) long long int _9; vector(2) long long int _10; <bb 2> [local count: 1073741824]: _1 = m_3(D) < { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; _2 = VIEW_CONVERT_EXPR<vector(16) signed char>(_1); t_4 = VIEW_CONVERT_EXPR<W>(_2); _9 = x_5(D) ^ y_6(D); _10 = t_4 & _9; _7 = x_5(D) ^ _10; return _7; } The mask mode is V16QI and the data mode is V2DI (please forgive my RTL terminology). The assignment t_4 view-converts the mask into the "data type" for the bitwise operations. The use x86's pblendvb, the "vcond_expr" operation needs to be mask's mode (V16QI) rather than the data's mode (V2DI). Hence the unsigned_type_for of the truth_type_for [if they have different NUNITS]. Obviously, converting the mask to V2DI and performing a V2DI blend, won't produce the same result. The most useful clause of vector_mask_p is actually the VECTOR_BOOLEAN_ TYPE_P test that catches all "mask types", such as those that result from vector comparisons. Note that vector_mask_p is tested against the operand of a view_convert expr. The remaining cases handle truth_*_expr like operations on those comparisons. One follow-up patch is to additionally allow VIEW_CONVERT_EXPR if both source and destination are of VECTOR_TYPE_P and known_eq TYPE_VECTOR_SUBUNITS. Likewise, a C cst_vector_mask_p could check each element rather than the catch all "integer_zerop || integer_all_onesp". I agree with gimple-isel replacing VEC_COND_EXPR when it's supported in hardware, just like .FMA is inserted to replace the universal MULT_EXPR/PLUS_EXPR tree codes, the question is whether vec_cond_expr (and vec_duplicate) can always be expanded moderately efficiently by the middle-end. For one thing, we have vector cost models that should be able to distinguish between efficient and inefficient implementations. And a vec_duplicate expander for SPARC should be able to generate an optimal sequence of instructions, even if there's isn't native hardware (instruction) support. For example, scalar multiplication by 0x0101010101010101 may be a good way to vec_duplicate QI mode to V8QI mode (via DI mode), at least with -Os. As you've mentioned, the VEC_PERM infrastructure should be useful here. [p.s. it's unfortunate that some of my patches appear controversial. By randomly selecting Bugzilla (regression) PRs that have been open for a long time, I seem to be selecting/enriching for bugs for which there's no simple solution, and that maintainers have already thought about for a long time but without coming up a satisfactory solution]. Thanks again for your assistance/direction. Cheers, Roger -- > -----Original Message----- > From: Richard Biener <richard.guent...@gmail.com> > Sent: 23 May 2022 14:36 > To: Roger Sayle <ro...@nextmovesoftware.com> > Cc: GCC Patches <gcc-patches@gcc.gnu.org> > Subject: Re: [PATCH/RFC] PR tree-optimization/96912: Recognize > VEC_COND_EXPR in match.pd > > On Mon, May 23, 2022 at 3:06 PM Roger Sayle > <ro...@nextmovesoftware.com> wrote: > > > > > > Hi Richard, > > I was wondering what you think of the following patch as a solution to > > PR tree-optimization/96912, i.e. the ability to recognize pblendvb > > from regular code rather than as a target specific builtin? > > > > The obvious point of contention is that the current middle-end > > philosophy around vector expressions is that the middle-end should > > continually check for backend support, whereas I prefer the "old > > school" view that trees are an abstraction and that RTL expansion is > > the point where these abstract operations get converted/lowered into > instructions supported by the target. > > That's still true for trees aka GENERIC but GIMPLE is more like RTL here. > Note before vector lowering GIMPLE is like GENERIC but after the "RTL > expansion" of vectors is considered done. I _think_ this was done to get the > opportunity to optimize the lowered code (and maybe cheat by doing that > lowering sub-optimally) given that user-written "generic" > vector code tends to run into limitations. > > > [The exceptions being built-in functions, IFN_* etc.] Should tree.texi > > document which tree codes can't be used without checking the backend. > > I suppose yes, but it's not really "which tree codes" but which types. Even > for > PLUS_EXPR you have to check for target support when vector types are > involved. > > Note when being too lazy before vector lowering you could end up transforming > previously supported IL into unsupported and thus triggering vector lowering > to > perform elementwise operations, severly slowing down code which is why you > might find checks like if (target supports new code || target didn't support > old > code) > > > Bootstrapped and regression tested, but this obviously depends upon > > RTL expansion being able to perform the inverse operation/lowering if > required. > > So the case in question "historically" was a task for RTL combine. If we now > bring that to GIMPLE we should indeed verify if the target can efficiently(!) > do > the operation we like to use. In this particular case it would be > vec_cond_mask > support for the created VEC_COND_EXPR. > We also have to avoid doing this after ISEL. > > Note all original types are data types while you need a mask type for the > selector which in turn means you will almost never match unless you hit the > > +(match vector_mask_p > + VECTOR_CST@0 > + (if (integer_zerop (@0) || integer_all_onesp (@0)))) > > case? > > +(simplify > + (bit_xor:c (bit_and:c (bit_xor:c @0 @1) (view_convert > +vector_mask_p@2)) @0) (if (VECTOR_TYPE_P (type) > + && VECTOR_TYPE_P (TREE_TYPE (@2))) > + (with { tree masktype = truth_type_for (TREE_TYPE (@2)); > > I think you want to check the truth_type_for (type) instead, check that you > can > V_C_E @2 to it by checking it's a vector mode and the same as the truth_type > mode. > > + tree vecttype = maybe_ne (TYPE_VECTOR_SUBPARTS (masktype), > + TYPE_VECTOR_SUBPARTS (type)) > + ? unsigned_type_for (masktype) > + : type; } > + (view_convert (vec_cond:vecttype (view_convert:masktype @2) > + (view_convert:vecttype @1) > + (view_convert:vecttype @0)))))) > > and then have > > (vec_cond (view_convert:masktype @2) @1 @0) > > Richard. > > > > > 2022-05-23 Roger Sayle <ro...@nextmovesoftware.com> > > > > gcc/ChangeLog > > PR tree-optimization/96912 > > * match.pd (vector_mask_p): New predicate to identify vectors > > where every element must be zero or all ones. > > (bit_xor (bit_and (bit_xor ...) ...) ...): Recognize a VEC_COND_EXPR > > expressed as logical vector operations. > > > > gcc/testsuite/ChangeLog > > PR tree-optimization/96912 > > * gcc.target/i386/pr96912.c: New test case. > > > > > > Thoughts? How would you solve this PR? Are there convenience > > predicates for testing whether a target supports vec_cond_expr, > vec_duplicate, etc? > > > > Cheers, > > Roger > > -- > >