Hi,
the attached Ada testcase compiled for x86_64-w64-mingw32 with -O3 -msse2 -
gnatn exposes an instability in the SLP reduction detection: in 32-bit mode,
the inner loop is effectively vectorized (after being unrolled by cunrolli):
r1.adb:6:9: optimized: loop vectorized using 16 byte vectors and unroll factor
1
whereas it is not in 64-bit mode:
r1.adb:6:9: missed: couldn't vectorize loop
despite almost similar .ifcvt tree dumps. The difference stems from this:
<bb 3> [local count: 268435456]:
# i_1 = PHI <0(5), i_8(6)>
# ret$F$1_17 = PHI <ret$F$1_31(5), _27(6)>
# ret$F$2_30 = PHI <ret$F$2_32(5), _35(6)>
# ret$F$3_21 = PHI <ret$F$3_37(5), _43(6)>
# ret$F$4_24 = PHI <ret$F$4_39(5), _16(6)>
i_8 = i_1 + 1;
_27 = ret$F$1_17 + ret$F$1_31;
_35 = ret$F$2_30 + ret$F$2_32;
_43 = ret$F$3_21 + ret$F$3_37;
_16 = ret$F$4_24 + ret$F$4_39;
in the 32-bit .ifcvt tree dump and this:
<bb 3> [local count: 268435456]:
# i_1 = PHI <0(5), i_9(6)>
# ret$F$1_31 = PHI <ret$F$1_20(5), _38(6)>
# ret$F$2_29 = PHI <ret$F$2_19(5), _46(6)>
# ret$F$3_21 = PHI <ret$F$3_49(5), _54(6)>
# ret$F$4_47 = PHI <ret$F$4_41(5), _18(6)>
i_9 = i_1 + 1;
_38 = ret$F$1_20 + ret$F$1_31;
_46 = ret$F$2_19 + ret$F$2_29;
_54 = ret$F$3_21 + ret$F$3_49;
_18 = ret$F$4_41 + ret$F$4_47;
in the 64-bit .ifcvt tree dump. The first pattern is accepted as a reduction
by vect_build_slp_tree_1 but the second is not:
r1.adb:6:9: note: Build SLP for _18 = ret$F$4_41 + ret$F$4_47;
r1.adb:6:9: note: Build SLP for _54 = ret$F$3_21 + ret$F$3_49;
r1.adb:6:9: missed: Build SLP failed: different reduc_idx 0 instead of 1 in
_54 = ret$F$3_21 + ret$F$3_49;
r1.adb:6:9: note: Build SLP for _46 = ret$F$2_19 + ret$F$2_29;
r1.adb:6:9: note: Build SLP for _38 = ret$F$1_20 + ret$F$1_31;
which indicates that, if the operands of the addition had been swapped in:
_54 = ret$F$3_21 + ret$F$3_49;
then the pattern would have been recognized.
vect_build_slp_tree_1 already deals with this kind of issues for COND_EXPR:
Note COND_EXPR is possibly isomorphic to another one after swapping its
operands. Set SWAP[i] to 1 if stmt I is COND_EXPR and isomorphic to
the first stmt by swapping the two operands of comparison; set SWAP[i]
to 2 if stmt I is isormorphic to the first stmt by inverting the code
of comparison. Take A1 >= B1 ? X1 : Y1 as an exmple, it can be swapped
to (B1 <= A1 ? X1 : Y1); or be inverted to (A1 < B1) ? Y1 : X1. */
but not for commutative operations.
Does this mean that some form of canonicalization should have happened earlier
to make the reduction detection more robust for them? Or does the function
need to cope with them specifically, for example as in the attached patch that
fixes in the instability for this case? Or else should the reduction have
been detected earlier?
* tree-vect-slp.cc (vect_build_slp_tree_1): Accept swapped operands
for commutative operations in reduction statements.
--
Eric Botcazouwith P1; use P1;
package R1 is
function NSum (X : Arr; N : Positive) return Arr;
end R1;
package P1 is
type Arr is array (1 .. 4) of Float;
for Arr'Alignment use 16;
function Sum (X : Arr; Y : Arr) return Arr;
pragma Inline (Sum);
end P1;
package body R1 is
function NSum (X : Arr; N : Positive) return Arr is
Ret : Arr := X;
begin
for I in 1 .. N loop
Ret := Sum (Ret, X);
end loop;
return Ret;
end;
end R1;
package body P1 is
function Sum (X : Arr; Y : Arr) return Arr is
Result : Arr;
begin
for I in X'Range loop
Result(I) := X(I) + Y(I);
end loop;
return Result;
end;
end P1;
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index f553e8fba19..b33bbb8564b 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1345,6 +1345,12 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
else
{
if (first_reduc_idx != STMT_VINFO_REDUC_IDX (stmt_info)
+ /* For commutative operations, we can handle an operand swap. */
+ && !(rhs_code.is_tree_code ()
+ && commutative_tree_code (tree_code (rhs_code))
+ && first_reduc_idx >= 0
+ && STMT_VINFO_REDUC_IDX (stmt_info) >= 0
+ && first_reduc_idx == 1 - STMT_VINFO_REDUC_IDX (stmt_info))
/* For SLP reduction groups the index isn't necessarily
uniform but only that of the first stmt matters. */
&& !(first_reduc_idx != -1