In the given testcase, g++ splits a live operation into two scalar statements and four vector statements.
_5 = _4 >> 2; _7 = (short int) _5; Is turned into: vect__5.32_80 = vect__4.31_76 >> 2; vect__5.32_81 = vect__4.31_77 >> 2; vect__5.32_82 = vect__4.31_78 >> 2; vect__5.32_83 = vect__4.31_79 >> 2; vect__7.33_86 = VEC_PACK_TRUNC_EXPR <vect__5.32_80, vect__5.32_81>; vect__7.33_87 = VEC_PACK_TRUNC_EXPR <vect__5.32_82, vect__5.32_83>; _5 is then accessed outside the loop. This patch ensures that vectorizable_live_operation picks the correct scalar statement. I removed the "three possibilites" comment because it was no longer accurate (it's also possible to have more vector statements than scalar statements) and the calculation is now much simpler. Tested on x86 and aarch64. Ok to commit? gcc/ PR tree-optimization/71483 * tree-vect-loop.c (vectorizable_live_operation): Pick correct index for slp testsuite/g++.dg/vect PR tree-optimization/71483 * pr71483.c: New Alan. diff --git a/gcc/testsuite/g++.dg/vect/pr71483.c b/gcc/testsuite/g++.dg/vect/pr71483.c new file mode 100644 index 0000000000000000000000000000000000000000..77f879c9a89b8b41ef9dde3c343591857 2dc8d01 --- /dev/null +++ b/gcc/testsuite/g++.dg/vect/pr71483.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +int b, c, d; +short *e; +void fn1() { + for (; b; b--) { + d = *e >> 2; + *e++ = d; + c = *e; + *e++ = d; + } +} diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 4c8678505df6ec572b69fd7d12ac55cf4619ece6..a2413bf9c678d11cc2ffd22bc7d984e91 1831804 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -6368,24 +6368,20 @@ vectorizable_live_operation (gimple *stmt, int num_scalar = SLP_TREE_SCALAR_STMTS (slp_node).length (); int num_vec = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node); - int scalar_per_vec = num_scalar / num_vec; - /* There are three possibilites here: - 1: All scalar stmts fit in a single vector. - 2: All scalar stmts fit multiple times into a single vector. - We must choose the last occurence of stmt in the vector. - 3: Scalar stmts are split across multiple vectors. - We must choose the correct vector and mod the lane accordingly. */ + /* Get the last occurrence of the scalar index from the concatenation of + all the slp vectors. Calculate which slp vector it is and the index + within. */ + int pos = (num_vec * nunits) - num_scalar + slp_index; + int vec_entry = pos / nunits; + int vec_index = pos % nunits; /* Get the correct slp vectorized stmt. */ - int vec_entry = slp_index / scalar_per_vec; vec_lhs = gimple_get_lhs (SLP_TREE_VEC_STMTS (slp_node)[vec_entry]); /* Get entry to use. */ - bitstart = build_int_cst (unsigned_type_node, - scalar_per_vec - (slp_index % scalar_per_vec)); + bitstart = build_int_cst (unsigned_type_node, vec_index); bitstart = int_const_binop (MULT_EXPR, bitsize, bitstart); - bitstart = int_const_binop (MINUS_EXPR, vec_bitsize, bitstart); } else {