Hey,

On Sat, Aug 25, 2018 at 12:15:16PM -0500, Bill Schmidt wrote:
> On 8/20/18 4:44 PM, Will Schmidt wrote:
> > Enable GIMPLE folding of the vec_splat() intrinsic. (v3).
> >     
> > This uses the tree_vec_extract() function out of tree-vect-generic.c
> > to retrieve the splat value, which is a BIT_FIELD_REF.   That function is
> > made non-static as part of this change.
> >     
> > Testcases are already in-tree.
> >     
> > V2 updates, per feedback previously received.
> > Forced arg1 into range (modulo #elements) before attempting to extract
> > the splat value.
> > Removed the (now unnecessary) code that did bounds-checking before calling
> > the tree_vec_extract helper.
> > Used arg0_type rather than lhs_type for calculating the tree size.
> >     
> > V3 updates, inspired by additional offline discussion with Segher.
> > Noting that for vec_splat(arg1,arg2), the ABI describes ARG2 as an element
> > number less than the number of elements supported by the respective ARG1
> > type, so we do not attempt to gimple-fold the intrinsic if we determine our
> > value is out of range. Thus, the initial check ensures that ARG1 is both
> > constant and has a valid index into ARG0.
> > The subsequent modulo operation is no longer necessary, and has been 
> > removed.

> > 2018-08-20  Will Schmidt  <will_schm...@vnet.ibm.com>
> >     
> >             * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add 
> > support for
> >               early gimple folding of vec_splat().

s/  //

> >             * tree-vect-generic.c: Remove static from tree_vec_extract() 
> > definition.
> >             * gimple-fold.h:  Add an extern define for tree_vec_extract().

s/  / /

> > +   /* Only fold the vec_splat_*() if arg1 is both a constant value, and a 
> > valid
> > +    index into the arg0 vector.  */

Needs two more spaces indent?

> > +   unsigned int n_elts = VECTOR_CST_NELTS (arg0);
> > +   if (TREE_CODE (arg1) != INTEGER_CST
> > +       || TREE_INT_CST_LOW (arg1) > (n_elts -1))
> 
> Space between - and 1.
> 
> I'm still not convinced we should do this.  The semantics of vec_splat are
> to select the element as arg1 modulo n_elts.  Odd as it seems, it's 
> historically
> valid for values between 0 and 31 to be used regardless of the value of 
> n_elts.

But, the assembler complains.  So it never worked, and it seems better to
not allow it in the compiler either.

A lot of existing testcases generate assembler code the assembler chokes
on -- but the testcases are dg-do compile, so the assembler isn't run.

This is also PR86987; I have patches, but it snowballs because of the
aforementioned existing wrong testcases.

> Since arg1 is a constant we can easily bring it into valid range and expand it
> early, rather than losing optimization opportunities by keeping this as a
> built-in call.

We should just error on invalid input.

> Do we have test cases for arg1 in {n_elts, ..., 31}?  What's GCC's current
> behavior?  Maybe we already aren't honoring this technically legal case?

We do, lots.  Sometimes we compile it; the assembler chokes on the output
(but the assembler is not run in those testcases).  Sometimes we ICE.

> Otherwise looks fine to me, though of course I can't approve.

Looks fine to me, too.

> > +       int splat_elem_size = TREE_INT_CST_LOW (size_in_bytes (arg0_type))
> > +                               * BITS_PER_UNIT / n_elts;

(Well one thing then...  Wrong indent?)


Segher

Reply via email to