Reported to upstream at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101506 .
-- Maxim Kuvyrkov https://www.linaro.org > On 19 Jul 2021, at 02:30, ci_not...@linaro.org wrote: > > Successfully identified regression in *gcc* in CI configuration > tcwg_gnu/gnu-master-aarch64-check_bootstrap. So far, this commit has > regressed CI configurations: > - tcwg_gnu/gnu-master-aarch64-check_bootstrap > > Culprit: > <cut> > commit 1dd3f21095858fbfd3e28a149578d5fb67e75f95 > Author: Richard Biener <rguent...@suse.de> > Date: Tue Jul 13 13:59:15 2021 +0200 > > Support reduction def re-use for epilogue with different vector size > > The following adds support for re-using the vector reduction def > from the main loop in vectorized epilogue loops on architectures > which use different vector sizes for the epilogue. That's only > x86 as far as I am aware. > > 2021-07-13 Richard Biener <rguent...@suse.de> > > * tree-vect-loop.c (vect_find_reusable_accumulator): Handle > vector types where the old vector type has a multiple of > the new vector type elements. > (vect_create_partial_epilog): New function, split out from... > (vect_create_epilog_for_reduction): ... here. > (vect_transform_cycle_phi): Reduce the re-used accumulator > to the new vector type. > > * gcc.target/i386/vect-reduc-1.c: New testcase. > </cut> > > Results regressed to (for first_bad == > 1dd3f21095858fbfd3e28a149578d5fb67e75f95) > # reset_artifacts: > -10 > # build_abe binutils: > -2 > # build_abe bootstrap: > -1 > # build_abe dejagnu: > 0 > # build_abe check_bootstrap -- --set runtestflags=g++.dg/dg.exp --set > runtestflags=gcc.target/aarch64/aarch64.exp: > 1 > # Getting actual results from build directory > /home/tcwg-buildslave/workspace/tcwg_gnu_3/artifacts/build-1dd3f21095858fbfd3e28a149578d5fb67e75f95/sumfiles > # > /home/tcwg-buildslave/workspace/tcwg_gnu_3/artifacts/build-1dd3f21095858fbfd3e28a149578d5fb67e75f95/sumfiles/libstdc++.sum > # > /home/tcwg-buildslave/workspace/tcwg_gnu_3/artifacts/build-1dd3f21095858fbfd3e28a149578d5fb67e75f95/sumfiles/gfortran.sum > # > /home/tcwg-buildslave/workspace/tcwg_gnu_3/artifacts/build-1dd3f21095858fbfd3e28a149578d5fb67e75f95/sumfiles/libitm.sum > # > /home/tcwg-buildslave/workspace/tcwg_gnu_3/artifacts/build-1dd3f21095858fbfd3e28a149578d5fb67e75f95/sumfiles/libgomp.sum > # > /home/tcwg-buildslave/workspace/tcwg_gnu_3/artifacts/build-1dd3f21095858fbfd3e28a149578d5fb67e75f95/sumfiles/libatomic.sum > # > /home/tcwg-buildslave/workspace/tcwg_gnu_3/artifacts/build-1dd3f21095858fbfd3e28a149578d5fb67e75f95/sumfiles/g++.sum > # > /home/tcwg-buildslave/workspace/tcwg_gnu_3/artifacts/build-1dd3f21095858fbfd3e28a149578d5fb67e75f95/sumfiles/gcc.sum > # Manifest: > gcc-compare-results/contrib/testsuite-management/flaky/gnu-master-aarch64-check_bootstrap.xfail > # Getting actual results from build directory base-artifacts/sumfiles > # base-artifacts/sumfiles/libstdc++.sum > # base-artifacts/sumfiles/gfortran.sum > # base-artifacts/sumfiles/libitm.sum > # base-artifacts/sumfiles/libgomp.sum > # base-artifacts/sumfiles/libatomic.sum > # base-artifacts/sumfiles/g++.sum > # base-artifacts/sumfiles/gcc.sum > # > # > # Unexpected results in this build (new failures) > # === gcc tests === > # > # Running gcc.target/aarch64/aarch64.exp ... > # FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler fminnmv > # FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler fmaxnmv > # > # === Results Summary === > > from (for last_good == a7098d6ef4e4e799dab8ef925c62b199d707694b) > # reset_artifacts: > -10 > # build_abe binutils: > -2 > # build_abe bootstrap: > -1 > # build_abe dejagnu: > 0 > # build_abe check_bootstrap -- --set runtestflags=g++.dg/dg.exp --set > runtestflags=gcc.target/aarch64/aarch64.exp: > 1 > > Artifacts of last_good build: > https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-aarch64-check_bootstrap/80/artifact/artifacts/build-a7098d6ef4e4e799dab8ef925c62b199d707694b/ > Artifacts of first_bad build: > https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-aarch64-check_bootstrap/80/artifact/artifacts/build-1dd3f21095858fbfd3e28a149578d5fb67e75f95/ > Build top page/logs: > https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-aarch64-check_bootstrap/80/ > > Configuration details: > > > Reproduce builds: > <cut> > mkdir investigate-gcc-1dd3f21095858fbfd3e28a149578d5fb67e75f95 > cd investigate-gcc-1dd3f21095858fbfd3e28a149578d5fb67e75f95 > > git clone https://git.linaro.org/toolchain/jenkins-scripts > > mkdir -p artifacts/manifests > curl -o artifacts/manifests/build-baseline.sh > https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-aarch64-check_bootstrap/80/artifact/artifacts/manifests/build-baseline.sh > --fail > curl -o artifacts/manifests/build-parameters.sh > https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-aarch64-check_bootstrap/80/artifact/artifacts/manifests/build-parameters.sh > --fail > curl -o artifacts/test.sh > https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-aarch64-check_bootstrap/80/artifact/artifacts/test.sh > --fail > chmod +x artifacts/test.sh > > # Reproduce the baseline build (build all pre-requisites) > ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh > > # Save baseline build state (which is then restored in artifacts/test.sh) > rsync -a --del --delete-excluded --exclude bisect/ --exclude artifacts/ > --exclude gcc/ ./ ./bisect/baseline/ > > cd gcc > > # Reproduce first_bad build > git checkout --detach 1dd3f21095858fbfd3e28a149578d5fb67e75f95 > ../artifacts/test.sh > > # Reproduce last_good build > git checkout --detach a7098d6ef4e4e799dab8ef925c62b199d707694b > ../artifacts/test.sh > > cd .. > </cut> > > History of pending regressions and results: > https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/ci/tcwg_gnu/gnu-master-aarch64-check_bootstrap > > Artifacts: > https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-aarch64-check_bootstrap/80/artifact/artifacts/ > Build log: > https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-aarch64-check_bootstrap/80/consoleText > > Full commit (up to 1000 lines): > <cut> > commit 1dd3f21095858fbfd3e28a149578d5fb67e75f95 > Author: Richard Biener <rguent...@suse.de> > Date: Tue Jul 13 13:59:15 2021 +0200 > > Support reduction def re-use for epilogue with different vector size > > The following adds support for re-using the vector reduction def > from the main loop in vectorized epilogue loops on architectures > which use different vector sizes for the epilogue. That's only > x86 as far as I am aware. > > 2021-07-13 Richard Biener <rguent...@suse.de> > > * tree-vect-loop.c (vect_find_reusable_accumulator): Handle > vector types where the old vector type has a multiple of > the new vector type elements. > (vect_create_partial_epilog): New function, split out from... > (vect_create_epilog_for_reduction): ... here. > (vect_transform_cycle_phi): Reduce the re-used accumulator > to the new vector type. > > * gcc.target/i386/vect-reduc-1.c: New testcase. > --- > gcc/testsuite/gcc.target/i386/vect-reduc-1.c | 17 ++ > gcc/tree-vect-loop.c | 227 ++++++++++++++++----------- > 2 files changed, 156 insertions(+), 88 deletions(-) > > diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-1.c > b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c > new file mode 100644 > index 00000000000..9ee9ba4e736 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -mavx2 -mno-avx512f -fdump-tree-vect-details" } */ > + > +#define N 32 > +int foo (int *a, int n) > +{ > + int sum = 1; > + for (int i = 0; i < 8*N + 4; ++i) > + sum += a[i]; > + return sum; > +} > + > +/* The reduction epilog should be vectorized and the accumulator > + re-used. */ > +/* { dg-final { scan-tree-dump "LOOP EPILOGUE VECTORIZED" "vect" } } */ > +/* { dg-final { scan-assembler-times "psrl" 2 } } */ > +/* { dg-final { scan-assembler-times "padd" 5 } } */ > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index 8c27d75f889..e9780158a51 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -4896,12 +4896,11 @@ vect_find_reusable_accumulator (loop_vec_info > loop_vinfo, > accumulator->reduc_info->reduc_scalar_results.begin ())) > return false; > > - /* For now, only handle the case in which both loops are operating on the > - same vector types. In future we could reduce wider vectors to narrower > - ones as well. */ > + /* Handle the case where we can reduce wider vectors to narrower ones. */ > tree vectype = STMT_VINFO_VECTYPE (reduc_info); > tree old_vectype = TREE_TYPE (accumulator->reduc_input); > - if (!useless_type_conversion_p (old_vectype, vectype)) > + if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype), > + TYPE_VECTOR_SUBPARTS (vectype))) > return false; > > /* Non-SLP reductions might apply an adjustment after the reduction > @@ -4935,6 +4934,101 @@ vect_find_reusable_accumulator (loop_vec_info > loop_vinfo, > return true; > } > > +/* Reduce the vector VEC_DEF down to VECTYPE with reduction operation > + CODE emitting stmts before GSI. Returns a vector def of VECTYPE. */ > + > +static tree > +vect_create_partial_epilog (tree vec_def, tree vectype, enum tree_code code, > + gimple_seq *seq) > +{ > + unsigned nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec_def)).to_constant > (); > + unsigned nunits1 = TYPE_VECTOR_SUBPARTS (vectype).to_constant (); > + tree stype = TREE_TYPE (vectype); > + tree new_temp = vec_def; > + while (nunits > nunits1) > + { > + nunits /= 2; > + tree vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE > (vectype), > + stype, nunits); > + unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1)); > + > + /* The target has to make sure we support lowpart/highpart > + extraction, either via direct vector extract or through > + an integer mode punning. */ > + tree dst1, dst2; > + gimple *epilog_stmt; > + if (convert_optab_handler (vec_extract_optab, > + TYPE_MODE (TREE_TYPE (new_temp)), > + TYPE_MODE (vectype1)) > + != CODE_FOR_nothing) > + { > + /* Extract sub-vectors directly once vec_extract becomes > + a conversion optab. */ > + dst1 = make_ssa_name (vectype1); > + epilog_stmt > + = gimple_build_assign (dst1, BIT_FIELD_REF, > + build3 (BIT_FIELD_REF, vectype1, > + new_temp, TYPE_SIZE (vectype1), > + bitsize_int (0))); > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > + dst2 = make_ssa_name (vectype1); > + epilog_stmt > + = gimple_build_assign (dst2, BIT_FIELD_REF, > + build3 (BIT_FIELD_REF, vectype1, > + new_temp, TYPE_SIZE (vectype1), > + bitsize_int (bitsize))); > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > + } > + else > + { > + /* Extract via punning to appropriately sized integer mode > + vector. */ > + tree eltype = build_nonstandard_integer_type (bitsize, 1); > + tree etype = build_vector_type (eltype, 2); > + gcc_assert (convert_optab_handler (vec_extract_optab, > + TYPE_MODE (etype), > + TYPE_MODE (eltype)) > + != CODE_FOR_nothing); > + tree tem = make_ssa_name (etype); > + epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR, > + build1 (VIEW_CONVERT_EXPR, > + etype, new_temp)); > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > + new_temp = tem; > + tem = make_ssa_name (eltype); > + epilog_stmt > + = gimple_build_assign (tem, BIT_FIELD_REF, > + build3 (BIT_FIELD_REF, eltype, > + new_temp, TYPE_SIZE (eltype), > + bitsize_int (0))); > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > + dst1 = make_ssa_name (vectype1); > + epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR, > + build1 (VIEW_CONVERT_EXPR, > + vectype1, tem)); > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > + tem = make_ssa_name (eltype); > + epilog_stmt > + = gimple_build_assign (tem, BIT_FIELD_REF, > + build3 (BIT_FIELD_REF, eltype, > + new_temp, TYPE_SIZE (eltype), > + bitsize_int (bitsize))); > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > + dst2 = make_ssa_name (vectype1); > + epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR, > + build1 (VIEW_CONVERT_EXPR, > + vectype1, tem)); > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > + } > + > + new_temp = make_ssa_name (vectype1); > + epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2); > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > + } > + > + return new_temp; > +} > + > /* Function vect_create_epilog_for_reduction > > Create code at the loop-epilog to finalize the result of a reduction > @@ -5684,87 +5778,11 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, > > /* First reduce the vector to the desired vector size we should > do shift reduction on by combining upper and lower halves. */ > - new_temp = reduc_inputs[0]; > - while (nunits > nunits1) > - { > - nunits /= 2; > - vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype), > - stype, nunits); > - unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1)); > - > - /* The target has to make sure we support lowpart/highpart > - extraction, either via direct vector extract or through > - an integer mode punning. */ > - tree dst1, dst2; > - if (convert_optab_handler (vec_extract_optab, > - TYPE_MODE (TREE_TYPE (new_temp)), > - TYPE_MODE (vectype1)) > - != CODE_FOR_nothing) > - { > - /* Extract sub-vectors directly once vec_extract becomes > - a conversion optab. */ > - dst1 = make_ssa_name (vectype1); > - epilog_stmt > - = gimple_build_assign (dst1, BIT_FIELD_REF, > - build3 (BIT_FIELD_REF, vectype1, > - new_temp, TYPE_SIZE (vectype1), > - bitsize_int (0))); > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > - dst2 = make_ssa_name (vectype1); > - epilog_stmt > - = gimple_build_assign (dst2, BIT_FIELD_REF, > - build3 (BIT_FIELD_REF, vectype1, > - new_temp, TYPE_SIZE (vectype1), > - bitsize_int (bitsize))); > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > - } > - else > - { > - /* Extract via punning to appropriately sized integer mode > - vector. */ > - tree eltype = build_nonstandard_integer_type (bitsize, 1); > - tree etype = build_vector_type (eltype, 2); > - gcc_assert (convert_optab_handler (vec_extract_optab, > - TYPE_MODE (etype), > - TYPE_MODE (eltype)) > - != CODE_FOR_nothing); > - tree tem = make_ssa_name (etype); > - epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR, > - build1 (VIEW_CONVERT_EXPR, > - etype, new_temp)); > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > - new_temp = tem; > - tem = make_ssa_name (eltype); > - epilog_stmt > - = gimple_build_assign (tem, BIT_FIELD_REF, > - build3 (BIT_FIELD_REF, eltype, > - new_temp, TYPE_SIZE (eltype), > - bitsize_int (0))); > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > - dst1 = make_ssa_name (vectype1); > - epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR, > - build1 (VIEW_CONVERT_EXPR, > - vectype1, tem)); > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > - tem = make_ssa_name (eltype); > - epilog_stmt > - = gimple_build_assign (tem, BIT_FIELD_REF, > - build3 (BIT_FIELD_REF, eltype, > - new_temp, TYPE_SIZE (eltype), > - bitsize_int (bitsize))); > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > - dst2 = make_ssa_name (vectype1); > - epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR, > - build1 (VIEW_CONVERT_EXPR, > - vectype1, tem)); > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > - } > - > - new_temp = make_ssa_name (vectype1); > - epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2); > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > - reduc_inputs[0] = new_temp; > - } > + gimple_seq stmts = NULL; > + new_temp = vect_create_partial_epilog (reduc_inputs[0], vectype1, > + code, &stmts); > + gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT); > + reduc_inputs[0] = new_temp; > > if (reduce_with_shift && !slp_reduc) > { > @@ -7681,13 +7699,46 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo, > > if (auto *accumulator = reduc_info->reused_accumulator) > { > + tree def = accumulator->reduc_input; > + unsigned int nreduc; > + bool res = constant_multiple_p (TYPE_VECTOR_SUBPARTS (TREE_TYPE (def)), > + TYPE_VECTOR_SUBPARTS (vectype_out), > + &nreduc); > + gcc_assert (res); > + if (nreduc != 1) > + { > + /* Reduce the single vector to a smaller one. */ > + gimple_seq stmts = NULL; > + def = vect_create_partial_epilog (def, vectype_out, > + STMT_VINFO_REDUC_CODE (reduc_info), > + &stmts); > + /* Adjust the input so we pick up the partially reduced value > + for the skip edge in vect_create_epilog_for_reduction. */ > + accumulator->reduc_input = def; > + if (loop_vinfo->main_loop_edge) > + { > + /* While we'd like to insert on the edge this will split > + blocks and disturb bookkeeping, we also will eventually > + need this on the skip edge. Rely on sinking to > + fixup optimal placement and insert in the pred. */ > + gimple_stmt_iterator gsi > + = gsi_last_bb (loop_vinfo->main_loop_edge->src); > + /* Insert before a cond that eventually skips the > + epilogue. */ > + if (!gsi_end_p (gsi) && stmt_ends_bb_p (gsi_stmt (gsi))) > + gsi_prev (&gsi); > + gsi_insert_seq_after (&gsi, stmts, GSI_CONTINUE_LINKING); > + } > + else > + gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), > + stmts); > + } > if (loop_vinfo->main_loop_edge) > vec_initial_defs[0] > - = vect_get_main_loop_result (loop_vinfo, accumulator->reduc_input, > + = vect_get_main_loop_result (loop_vinfo, def, > vec_initial_defs[0]); > else > - vec_initial_defs.safe_push (accumulator->reduc_input); > - gcc_assert (vec_initial_defs.length () == 1); > + vec_initial_defs.safe_push (def); > } > > /* Generate the reduction PHIs upfront. */ > </cut> _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain