Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant

Richard Biener Fri, 01 Jul 2016 05:03:12 -0700

On Thu, 30 Jun 2016, Kyrill Tkachov wrote:

> 
> On 28/06/16 08:54, Richard Biener wrote:
> > On Thu, 16 Jun 2016, Kyrill Tkachov wrote:
> > 
> > > On 15/06/16 22:53, Marc Glisse wrote:
> > > > On Wed, 15 Jun 2016, Kyrill Tkachov wrote:
> > > > 
> > > > > This is a respin of
> > > > > https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00952.html following
> > > > > feedback.
> > > > > I've changed the code to cast the operand to an unsigned type before
> > > > > applying the multiplication algorithm
> > > > > and cast it back to the signed type at the end.
> > > > > Whether to perform the cast is now determined by the function
> > > > > cast_mult_synth_to_unsigned in which I've implemented
> > > > > the cases that Marc mentioned in [1]. Please do let me know
> > > > > if there are any other cases that need to be handled.
> > > > Ah, I never meant those cases as an exhaustive list, I was just looking
> > > > for
> > > > examples showing that the transformation was unsafe, and those 2 came to
> > > > mind:
> > > > 
> > > > - x*15 -> x*16-x the second one can overflow even when the first one
> > > > doesn't.
> > > > 
> > > > - x*-2 -> -(x*2) can overflow when the result is INT_MIN (maybe that's
> > > > redundant with the negate_variant check?)
> > > > 
> > > > On the other hand, as long as we remain in the 'positive' operations,
> > > > turning x*3 to x<<1+x seems perfectly safe. And even x*30 to (x*16-x)*2
> > > > cannot cause spurious overflows. But I didn't look at the algorithm
> > > > closely
> > > > enough to characterize the safe cases. Now if you have done it, that's
> > > > good
> > > > :-) Otherwise, we might want to err on the side of caution.
> > > > 
> > > I'll be honest, I didn't give it much thought beyond convincing myself
> > > that
> > > the two cases you listed are legitimate.
> > > Looking at expand_mult_const in expmed.c can be helpful (where it updates
> > > val_so_far for checking purposes) to see
> > > the different algorithm cases. I think the only steps that could cause
> > > overflow are alg_sub_t_m2, alg_sub_t2_m and alg_sub_factor or when the
> > > final
> > > step is negate_variant, which are what you listed (and covered in this
> > > patch).
> > > 
> > > richi is away on PTO for the time being though, so we have some time to
> > > convince ourselves :)
> > ;)  I think the easiest way would be to always use unsigned arithmetic.
> > 
> > While VRP doesn't do anything on vector operations we still have some
> > match.pd patterns that rely on correct overflow behavior and those
> > may be enabled for vector types as well.
> 
> That's fine with me.
> Here's the patch that performs the casts to unsigned and back when the input
> type doesn't wrap on overflow.
> 
> Bootstrapped and tested on arm, aarch64, x86_64.
> 
> How's this?


+static bool
+target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var,
+                                tree scaltype)
+{
...
+  tree vectype = get_vectype_for_scalar_type (scaltype);
+
+  if (!vectype)
+    return false;
+
+  /* All synthesis algorithms require shifts, so bail out early if
+     target cannot vectorize them.
+     TODO: If the target doesn't support vector shifts but supports 
vector
+     addition we could synthesize shifts that way.  
vect_synth_mult_by_constant
+     would need to be updated to do that.  */
+  if (!vect_supportable_shift (LSHIFT_EXPR, scaltype))
+    return false;

I think these tests should be done in the caller before calling
synth_mult (and vectype be passed down accordingly).  Also I wonder
if synth_mult for a * 2 produces a << 1 or a + a - the case of
a * 2 -> a + a was what Marcs patch handled.  Guarding off LSHIFT_EXPR
support that early will make that fail on targets w/o vector shift.

+  bool supports_vminus = target_has_vecop_for_code (MINUS_EXPR, vectype);
+  bool supports_vplus = target_has_vecop_for_code (PLUS_EXPR, vectype);
+
+  if (var == negate_variant
+      && !target_has_vecop_for_code (NEGATE_EXPR, vectype))
+    return false;

I think we don't have any targets that support PLUS but not MINUS
or targets that do not support NEGATE.  OTOH double-checking doesn't 
matter.

+apply_binop_and_append_stmt (tree_code code, tree op1, tree op2,
+                            stmt_vec_info stmt_vinfo)
+{
+  if (TREE_INT_CST_LOW (op2) == 0

integer_zerop (op2)

+      && (code == LSHIFT_EXPR
+         || code == PLUS_EXPR))

+  tree itype = TREE_TYPE (op2);

I think it's dangerous to use the type of op2 given you synthesize
shifts as well.  Why not use the type of op1?

+      /* TODO: Targets that don't support vector shifts could synthesize
+        them using vector additions.  target_supports_mult_synth_alg 
would
+        need to be updated to allow this.  */
+      switch (alg.op[i])
+       {

so I suppose one could at least special-case << 1 and always use
PLUS for that.

Otherwise looks ok to me.  There is a PR with a testcase for
the x * 2 issue so please add that if it is fixed or amend the fix if not 
;)

Thanks,
Richard.

> Thanks,
> Kyrill
> 
> 2016-06-30  Kyrylo Tkachov  <[email protected]>
> 
>     PR target/65951
>     * tree-vect-patterns.c: Include mult-synthesis.h
>     (target_supports_mult_synth_alg): New function.
>     (apply_binop_and_append_stmt): Likewise.
>     (vect_synth_mult_by_constant): Likewise.
>     (target_has_vecop_for_code): Likewise.
>     (vect_recog_mult_pattern): Use above functions to synthesize vector
>     multiplication by integer constants.
> 
> 2016-06-30  Kyrylo Tkachov  <[email protected]>
> 
>     * gcc.dg/vect/vect-mult-const-pattern-1.c: New test.
>     * gcc.dg/vect/vect-mult-const-pattern-2.c: Likewise.
> 
> 

-- 
Richard Biener <[email protected]>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant

Reply via email to