Hi,

the attached Ada testcase compiled for x86_64-w64-mingw32 with -O3 -msse2 -
gnatn exposes an instability in the SLP reduction detection: in 32-bit mode, 
the inner loop is effectively vectorized (after being unrolled by cunrolli):

r1.adb:6:9: optimized: loop vectorized using 16 byte vectors and unroll factor 
1

whereas it is not in 64-bit mode:

r1.adb:6:9: missed: couldn't vectorize loop

despite almost similar .ifcvt tree dumps.  The difference stems from this:

  <bb 3> [local count: 268435456]:
  # i_1 = PHI <0(5), i_8(6)>
  # ret$F$1_17 = PHI <ret$F$1_31(5), _27(6)>
  # ret$F$2_30 = PHI <ret$F$2_32(5), _35(6)>
  # ret$F$3_21 = PHI <ret$F$3_37(5), _43(6)>
  # ret$F$4_24 = PHI <ret$F$4_39(5), _16(6)>
  i_8 = i_1 + 1;
  _27 = ret$F$1_17 + ret$F$1_31;
  _35 = ret$F$2_30 + ret$F$2_32;
  _43 = ret$F$3_21 + ret$F$3_37;
  _16 = ret$F$4_24 + ret$F$4_39;

in the 32-bit .ifcvt tree dump and this:

  <bb 3> [local count: 268435456]:
  # i_1 = PHI <0(5), i_9(6)>
  # ret$F$1_31 = PHI <ret$F$1_20(5), _38(6)>
  # ret$F$2_29 = PHI <ret$F$2_19(5), _46(6)>
  # ret$F$3_21 = PHI <ret$F$3_49(5), _54(6)>
  # ret$F$4_47 = PHI <ret$F$4_41(5), _18(6)>
  i_9 = i_1 + 1;
  _38 = ret$F$1_20 + ret$F$1_31;
  _46 = ret$F$2_19 + ret$F$2_29;
  _54 = ret$F$3_21 + ret$F$3_49;
  _18 = ret$F$4_41 + ret$F$4_47;

in the 64-bit .ifcvt tree dump.  The first pattern is accepted as a reduction 
by vect_build_slp_tree_1 but the second is not:

r1.adb:6:9: note:   Build SLP for _18 = ret$F$4_41 + ret$F$4_47;
r1.adb:6:9: note:   Build SLP for _54 = ret$F$3_21 + ret$F$3_49;
r1.adb:6:9: missed:   Build SLP failed: different reduc_idx 0 instead of 1 in 
_54 = ret$F$3_21 + ret$F$3_49;
r1.adb:6:9: note:   Build SLP for _46 = ret$F$2_19 + ret$F$2_29;
r1.adb:6:9: note:   Build SLP for _38 = ret$F$1_20 + ret$F$1_31;

which indicates that, if the operands of the addition had been swapped in:

_54 = ret$F$3_21 + ret$F$3_49;

then the pattern would have been recognized.

vect_build_slp_tree_1 already deals with this kind of issues for COND_EXPR:

   Note COND_EXPR is possibly isomorphic to another one after swapping its
   operands.  Set SWAP[i] to 1 if stmt I is COND_EXPR and isomorphic to
   the first stmt by swapping the two operands of comparison; set SWAP[i]
   to 2 if stmt I is isormorphic to the first stmt by inverting the code
   of comparison.  Take A1 >= B1 ? X1 : Y1 as an exmple, it can be swapped
   to (B1 <= A1 ? X1 : Y1); or be inverted to (A1 < B1) ? Y1 : X1.  */

but not for commutative operations.

Does this mean that some form of canonicalization should have happened earlier 
to make the reduction detection more robust for them?  Or does the function 
need to cope with them specifically, for example as in the attached patch that 
fixes in the instability for this case?  Or else should the reduction have 
been detected earlier?


        * tree-vect-slp.cc (vect_build_slp_tree_1): Accept swapped operands
        for commutative operations in reduction statements.


-- 
Eric Botcazou
with P1; use P1;

package R1 is

  function NSum (X : Arr; N : Positive) return Arr;

end R1;
package P1 is

  type Arr is array (1 .. 4) of Float;
  for Arr'Alignment use 16;

  function Sum (X : Arr; Y : Arr) return Arr;
  pragma Inline (Sum);

end P1;
package body R1 is

  function NSum (X : Arr; N : Positive) return Arr is
    Ret : Arr := X;
  begin
    for I in 1 .. N loop
      Ret := Sum (Ret, X);
    end loop;
    return Ret;
  end;

end R1;
package body P1 is

  function Sum (X : Arr; Y : Arr) return Arr is
    Result : Arr;
  begin
    for I in X'Range loop
      Result(I) := X(I) + Y(I);
    end loop;
    return Result;
  end;

end P1;
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index f553e8fba19..b33bbb8564b 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1345,6 +1345,12 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
       else
 	{
 	  if (first_reduc_idx != STMT_VINFO_REDUC_IDX (stmt_info)
+	      /* For commutative operations, we can handle an operand swap.  */
+	      && !(rhs_code.is_tree_code ()
+		   && commutative_tree_code (tree_code (rhs_code))
+		   && first_reduc_idx >= 0
+		   && STMT_VINFO_REDUC_IDX (stmt_info) >= 0
+		   && first_reduc_idx == 1 - STMT_VINFO_REDUC_IDX (stmt_info))
 	      /* For SLP reduction groups the index isn't necessarily
 		 uniform but only that of the first stmt matters.  */
 	      && !(first_reduc_idx != -1

Reply via email to