Reported to upstream at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101506 .

--
Maxim Kuvyrkov
https://www.linaro.org

> On 19 Jul 2021, at 02:30, ci_not...@linaro.org wrote:
> 
> Successfully identified regression in *gcc* in CI configuration 
> tcwg_gnu/gnu-master-aarch64-check_bootstrap.  So far, this commit has 
> regressed CI configurations:
> - tcwg_gnu/gnu-master-aarch64-check_bootstrap
> 
> Culprit:
> <cut>
> commit 1dd3f21095858fbfd3e28a149578d5fb67e75f95
> Author: Richard Biener <rguent...@suse.de>
> Date:   Tue Jul 13 13:59:15 2021 +0200
> 
>    Support reduction def re-use for epilogue with different vector size
> 
>    The following adds support for re-using the vector reduction def
>    from the main loop in vectorized epilogue loops on architectures
>    which use different vector sizes for the epilogue.  That's only
>    x86 as far as I am aware.
> 
>    2021-07-13  Richard Biener  <rguent...@suse.de>
> 
>            * tree-vect-loop.c (vect_find_reusable_accumulator): Handle
>            vector types where the old vector type has a multiple of
>            the new vector type elements.
>            (vect_create_partial_epilog): New function, split out from...
>            (vect_create_epilog_for_reduction): ... here.
>            (vect_transform_cycle_phi): Reduce the re-used accumulator
>            to the new vector type.
> 
>            * gcc.target/i386/vect-reduc-1.c: New testcase.
> </cut>
> 
> Results regressed to (for first_bad == 
> 1dd3f21095858fbfd3e28a149578d5fb67e75f95)
> # reset_artifacts:
> -10
> # build_abe binutils:
> -2
> # build_abe bootstrap:
> -1
> # build_abe dejagnu:
> 0
> # build_abe check_bootstrap -- --set runtestflags=g++.dg/dg.exp --set 
> runtestflags=gcc.target/aarch64/aarch64.exp:
> 1
> # Getting actual results from build directory 
> /home/tcwg-buildslave/workspace/tcwg_gnu_3/artifacts/build-1dd3f21095858fbfd3e28a149578d5fb67e75f95/sumfiles
> #     
> /home/tcwg-buildslave/workspace/tcwg_gnu_3/artifacts/build-1dd3f21095858fbfd3e28a149578d5fb67e75f95/sumfiles/libstdc++.sum
> #     
> /home/tcwg-buildslave/workspace/tcwg_gnu_3/artifacts/build-1dd3f21095858fbfd3e28a149578d5fb67e75f95/sumfiles/gfortran.sum
> #     
> /home/tcwg-buildslave/workspace/tcwg_gnu_3/artifacts/build-1dd3f21095858fbfd3e28a149578d5fb67e75f95/sumfiles/libitm.sum
> #     
> /home/tcwg-buildslave/workspace/tcwg_gnu_3/artifacts/build-1dd3f21095858fbfd3e28a149578d5fb67e75f95/sumfiles/libgomp.sum
> #     
> /home/tcwg-buildslave/workspace/tcwg_gnu_3/artifacts/build-1dd3f21095858fbfd3e28a149578d5fb67e75f95/sumfiles/libatomic.sum
> #     
> /home/tcwg-buildslave/workspace/tcwg_gnu_3/artifacts/build-1dd3f21095858fbfd3e28a149578d5fb67e75f95/sumfiles/g++.sum
> #     
> /home/tcwg-buildslave/workspace/tcwg_gnu_3/artifacts/build-1dd3f21095858fbfd3e28a149578d5fb67e75f95/sumfiles/gcc.sum
> # Manifest:         
> gcc-compare-results/contrib/testsuite-management/flaky/gnu-master-aarch64-check_bootstrap.xfail
> # Getting actual results from build directory base-artifacts/sumfiles
> #     base-artifacts/sumfiles/libstdc++.sum
> #     base-artifacts/sumfiles/gfortran.sum
> #     base-artifacts/sumfiles/libitm.sum
> #     base-artifacts/sumfiles/libgomp.sum
> #     base-artifacts/sumfiles/libatomic.sum
> #     base-artifacts/sumfiles/g++.sum
> #     base-artifacts/sumfiles/gcc.sum
> # 
> # 
> # Unexpected results in this build (new failures)
> #             === gcc tests ===
> # 
> # Running gcc.target/aarch64/aarch64.exp ...
> # FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler fminnmv
> # FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler fmaxnmv
> # 
> #             === Results Summary ===
> 
> from (for last_good == a7098d6ef4e4e799dab8ef925c62b199d707694b)
> # reset_artifacts:
> -10
> # build_abe binutils:
> -2
> # build_abe bootstrap:
> -1
> # build_abe dejagnu:
> 0
> # build_abe check_bootstrap -- --set runtestflags=g++.dg/dg.exp --set 
> runtestflags=gcc.target/aarch64/aarch64.exp:
> 1
> 
> Artifacts of last_good build: 
> https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-aarch64-check_bootstrap/80/artifact/artifacts/build-a7098d6ef4e4e799dab8ef925c62b199d707694b/
> Artifacts of first_bad build: 
> https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-aarch64-check_bootstrap/80/artifact/artifacts/build-1dd3f21095858fbfd3e28a149578d5fb67e75f95/
> Build top page/logs: 
> https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-aarch64-check_bootstrap/80/
> 
> Configuration details:
> 
> 
> Reproduce builds:
> <cut>
> mkdir investigate-gcc-1dd3f21095858fbfd3e28a149578d5fb67e75f95
> cd investigate-gcc-1dd3f21095858fbfd3e28a149578d5fb67e75f95
> 
> git clone https://git.linaro.org/toolchain/jenkins-scripts
> 
> mkdir -p artifacts/manifests
> curl -o artifacts/manifests/build-baseline.sh 
> https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-aarch64-check_bootstrap/80/artifact/artifacts/manifests/build-baseline.sh
>  --fail
> curl -o artifacts/manifests/build-parameters.sh 
> https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-aarch64-check_bootstrap/80/artifact/artifacts/manifests/build-parameters.sh
>  --fail
> curl -o artifacts/test.sh 
> https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-aarch64-check_bootstrap/80/artifact/artifacts/test.sh
>  --fail
> chmod +x artifacts/test.sh
> 
> # Reproduce the baseline build (build all pre-requisites)
> ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh
> 
> # Save baseline build state (which is then restored in artifacts/test.sh)
> rsync -a --del --delete-excluded --exclude bisect/ --exclude artifacts/ 
> --exclude gcc/ ./ ./bisect/baseline/
> 
> cd gcc
> 
> # Reproduce first_bad build
> git checkout --detach 1dd3f21095858fbfd3e28a149578d5fb67e75f95
> ../artifacts/test.sh
> 
> # Reproduce last_good build
> git checkout --detach a7098d6ef4e4e799dab8ef925c62b199d707694b
> ../artifacts/test.sh
> 
> cd ..
> </cut>
> 
> History of pending regressions and results: 
> https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/ci/tcwg_gnu/gnu-master-aarch64-check_bootstrap
> 
> Artifacts: 
> https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-aarch64-check_bootstrap/80/artifact/artifacts/
> Build log: 
> https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-aarch64-check_bootstrap/80/consoleText
> 
> Full commit (up to 1000 lines):
> <cut>
> commit 1dd3f21095858fbfd3e28a149578d5fb67e75f95
> Author: Richard Biener <rguent...@suse.de>
> Date:   Tue Jul 13 13:59:15 2021 +0200
> 
>    Support reduction def re-use for epilogue with different vector size
> 
>    The following adds support for re-using the vector reduction def
>    from the main loop in vectorized epilogue loops on architectures
>    which use different vector sizes for the epilogue.  That's only
>    x86 as far as I am aware.
> 
>    2021-07-13  Richard Biener  <rguent...@suse.de>
> 
>            * tree-vect-loop.c (vect_find_reusable_accumulator): Handle
>            vector types where the old vector type has a multiple of
>            the new vector type elements.
>            (vect_create_partial_epilog): New function, split out from...
>            (vect_create_epilog_for_reduction): ... here.
>            (vect_transform_cycle_phi): Reduce the re-used accumulator
>            to the new vector type.
> 
>            * gcc.target/i386/vect-reduc-1.c: New testcase.
> ---
> gcc/testsuite/gcc.target/i386/vect-reduc-1.c |  17 ++
> gcc/tree-vect-loop.c                         | 227 ++++++++++++++++-----------
> 2 files changed, 156 insertions(+), 88 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-1.c 
> b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> new file mode 100644
> index 00000000000..9ee9ba4e736
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx2 -mno-avx512f -fdump-tree-vect-details" } */
> +
> +#define N 32
> +int foo (int *a, int n)
> +{
> +  int sum = 1;
> +  for (int i = 0; i < 8*N + 4; ++i)
> +    sum += a[i];
> +  return sum;
> +}
> +
> +/* The reduction epilog should be vectorized and the accumulator
> +   re-used.  */
> +/* { dg-final { scan-tree-dump "LOOP EPILOGUE VECTORIZED" "vect" } } */
> +/* { dg-final { scan-assembler-times "psrl" 2 } } */
> +/* { dg-final { scan-assembler-times "padd" 5 } } */
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 8c27d75f889..e9780158a51 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -4896,12 +4896,11 @@ vect_find_reusable_accumulator (loop_vec_info 
> loop_vinfo,
>                     accumulator->reduc_info->reduc_scalar_results.begin ()))
>     return false;
> 
> -  /* For now, only handle the case in which both loops are operating on the
> -     same vector types.  In future we could reduce wider vectors to narrower
> -     ones as well.  */
> +  /* Handle the case where we can reduce wider vectors to narrower ones.  */
>   tree vectype = STMT_VINFO_VECTYPE (reduc_info);
>   tree old_vectype = TREE_TYPE (accumulator->reduc_input);
> -  if (!useless_type_conversion_p (old_vectype, vectype))
> +  if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype),
> +                         TYPE_VECTOR_SUBPARTS (vectype)))
>     return false;
> 
>   /* Non-SLP reductions might apply an adjustment after the reduction
> @@ -4935,6 +4934,101 @@ vect_find_reusable_accumulator (loop_vec_info 
> loop_vinfo,
>   return true;
> }
> 
> +/* Reduce the vector VEC_DEF down to VECTYPE with reduction operation
> +   CODE emitting stmts before GSI.  Returns a vector def of VECTYPE.  */
> +
> +static tree
> +vect_create_partial_epilog (tree vec_def, tree vectype, enum tree_code code,
> +                         gimple_seq *seq)
> +{
> +  unsigned nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec_def)).to_constant 
> ();
> +  unsigned nunits1 = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
> +  tree stype = TREE_TYPE (vectype);
> +  tree new_temp = vec_def;
> +  while (nunits > nunits1)
> +    {
> +      nunits /= 2;
> +      tree vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE 
> (vectype),
> +                                                        stype, nunits);
> +      unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
> +
> +      /* The target has to make sure we support lowpart/highpart
> +      extraction, either via direct vector extract or through
> +      an integer mode punning.  */
> +      tree dst1, dst2;
> +      gimple *epilog_stmt;
> +      if (convert_optab_handler (vec_extract_optab,
> +                              TYPE_MODE (TREE_TYPE (new_temp)),
> +                              TYPE_MODE (vectype1))
> +       != CODE_FOR_nothing)
> +     {
> +       /* Extract sub-vectors directly once vec_extract becomes
> +          a conversion optab.  */
> +       dst1 = make_ssa_name (vectype1);
> +       epilog_stmt
> +           = gimple_build_assign (dst1, BIT_FIELD_REF,
> +                                  build3 (BIT_FIELD_REF, vectype1,
> +                                          new_temp, TYPE_SIZE (vectype1),
> +                                          bitsize_int (0)));
> +       gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> +       dst2 =  make_ssa_name (vectype1);
> +       epilog_stmt
> +           = gimple_build_assign (dst2, BIT_FIELD_REF,
> +                                  build3 (BIT_FIELD_REF, vectype1,
> +                                          new_temp, TYPE_SIZE (vectype1),
> +                                          bitsize_int (bitsize)));
> +       gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> +     }
> +      else
> +     {
> +       /* Extract via punning to appropriately sized integer mode
> +          vector.  */
> +       tree eltype = build_nonstandard_integer_type (bitsize, 1);
> +       tree etype = build_vector_type (eltype, 2);
> +       gcc_assert (convert_optab_handler (vec_extract_optab,
> +                                          TYPE_MODE (etype),
> +                                          TYPE_MODE (eltype))
> +                   != CODE_FOR_nothing);
> +       tree tem = make_ssa_name (etype);
> +       epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR,
> +                                          build1 (VIEW_CONVERT_EXPR,
> +                                                  etype, new_temp));
> +       gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> +       new_temp = tem;
> +       tem = make_ssa_name (eltype);
> +       epilog_stmt
> +           = gimple_build_assign (tem, BIT_FIELD_REF,
> +                                  build3 (BIT_FIELD_REF, eltype,
> +                                          new_temp, TYPE_SIZE (eltype),
> +                                          bitsize_int (0)));
> +       gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> +       dst1 = make_ssa_name (vectype1);
> +       epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR,
> +                                          build1 (VIEW_CONVERT_EXPR,
> +                                                  vectype1, tem));
> +       gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> +       tem = make_ssa_name (eltype);
> +       epilog_stmt
> +           = gimple_build_assign (tem, BIT_FIELD_REF,
> +                                  build3 (BIT_FIELD_REF, eltype,
> +                                          new_temp, TYPE_SIZE (eltype),
> +                                          bitsize_int (bitsize)));
> +       gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> +       dst2 =  make_ssa_name (vectype1);
> +       epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR,
> +                                          build1 (VIEW_CONVERT_EXPR,
> +                                                  vectype1, tem));
> +       gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> +     }
> +
> +      new_temp = make_ssa_name (vectype1);
> +      epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2);
> +      gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> +    }
> +
> +  return new_temp;
> +}
> +
> /* Function vect_create_epilog_for_reduction
> 
>    Create code at the loop-epilog to finalize the result of a reduction
> @@ -5684,87 +5778,11 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
> 
>       /* First reduce the vector to the desired vector size we should
>        do shift reduction on by combining upper and lower halves.  */
> -      new_temp = reduc_inputs[0];
> -      while (nunits > nunits1)
> -     {
> -       nunits /= 2;
> -       vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
> -                                                       stype, nunits);
> -       unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
> -
> -       /* The target has to make sure we support lowpart/highpart
> -          extraction, either via direct vector extract or through
> -          an integer mode punning.  */
> -       tree dst1, dst2;
> -       if (convert_optab_handler (vec_extract_optab,
> -                                  TYPE_MODE (TREE_TYPE (new_temp)),
> -                                  TYPE_MODE (vectype1))
> -           != CODE_FOR_nothing)
> -         {
> -           /* Extract sub-vectors directly once vec_extract becomes
> -              a conversion optab.  */
> -           dst1 = make_ssa_name (vectype1);
> -           epilog_stmt
> -               = gimple_build_assign (dst1, BIT_FIELD_REF,
> -                                      build3 (BIT_FIELD_REF, vectype1,
> -                                              new_temp, TYPE_SIZE (vectype1),
> -                                              bitsize_int (0)));
> -           gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> -           dst2 =  make_ssa_name (vectype1);
> -           epilog_stmt
> -               = gimple_build_assign (dst2, BIT_FIELD_REF,
> -                                      build3 (BIT_FIELD_REF, vectype1,
> -                                              new_temp, TYPE_SIZE (vectype1),
> -                                              bitsize_int (bitsize)));
> -           gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> -         }
> -       else
> -         {
> -           /* Extract via punning to appropriately sized integer mode
> -              vector.  */
> -           tree eltype = build_nonstandard_integer_type (bitsize, 1);
> -           tree etype = build_vector_type (eltype, 2);
> -           gcc_assert (convert_optab_handler (vec_extract_optab,
> -                                              TYPE_MODE (etype),
> -                                              TYPE_MODE (eltype))
> -                       != CODE_FOR_nothing);
> -           tree tem = make_ssa_name (etype);
> -           epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR,
> -                                              build1 (VIEW_CONVERT_EXPR,
> -                                                      etype, new_temp));
> -           gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> -           new_temp = tem;
> -           tem = make_ssa_name (eltype);
> -           epilog_stmt
> -               = gimple_build_assign (tem, BIT_FIELD_REF,
> -                                      build3 (BIT_FIELD_REF, eltype,
> -                                              new_temp, TYPE_SIZE (eltype),
> -                                              bitsize_int (0)));
> -           gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> -           dst1 = make_ssa_name (vectype1);
> -           epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR,
> -                                              build1 (VIEW_CONVERT_EXPR,
> -                                                      vectype1, tem));
> -           gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> -           tem = make_ssa_name (eltype);
> -           epilog_stmt
> -               = gimple_build_assign (tem, BIT_FIELD_REF,
> -                                      build3 (BIT_FIELD_REF, eltype,
> -                                              new_temp, TYPE_SIZE (eltype),
> -                                              bitsize_int (bitsize)));
> -           gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> -           dst2 =  make_ssa_name (vectype1);
> -           epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR,
> -                                              build1 (VIEW_CONVERT_EXPR,
> -                                                      vectype1, tem));
> -           gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> -         }
> -
> -       new_temp = make_ssa_name (vectype1);
> -       epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2);
> -       gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> -       reduc_inputs[0] = new_temp;
> -     }
> +      gimple_seq stmts = NULL;
> +      new_temp = vect_create_partial_epilog (reduc_inputs[0], vectype1,
> +                                          code, &stmts);
> +      gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
> +      reduc_inputs[0] = new_temp;
> 
>       if (reduce_with_shift && !slp_reduc)
>       {
> @@ -7681,13 +7699,46 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo,
> 
>   if (auto *accumulator = reduc_info->reused_accumulator)
>     {
> +      tree def = accumulator->reduc_input;
> +      unsigned int nreduc;
> +      bool res = constant_multiple_p (TYPE_VECTOR_SUBPARTS (TREE_TYPE (def)),
> +                                   TYPE_VECTOR_SUBPARTS (vectype_out),
> +                                   &nreduc);
> +      gcc_assert (res);
> +      if (nreduc != 1)
> +     {
> +       /* Reduce the single vector to a smaller one.  */
> +       gimple_seq stmts = NULL;
> +       def = vect_create_partial_epilog (def, vectype_out,
> +                                         STMT_VINFO_REDUC_CODE (reduc_info),
> +                                         &stmts);
> +       /* Adjust the input so we pick up the partially reduced value
> +          for the skip edge in vect_create_epilog_for_reduction.  */
> +       accumulator->reduc_input = def;
> +       if (loop_vinfo->main_loop_edge)
> +         {
> +           /* While we'd like to insert on the edge this will split
> +              blocks and disturb bookkeeping, we also will eventually
> +              need this on the skip edge.  Rely on sinking to
> +              fixup optimal placement and insert in the pred.  */
> +           gimple_stmt_iterator gsi
> +             = gsi_last_bb (loop_vinfo->main_loop_edge->src);
> +           /* Insert before a cond that eventually skips the
> +              epilogue.  */
> +           if (!gsi_end_p (gsi) && stmt_ends_bb_p (gsi_stmt (gsi)))
> +             gsi_prev (&gsi);
> +           gsi_insert_seq_after (&gsi, stmts, GSI_CONTINUE_LINKING);
> +         }
> +       else
> +         gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop),
> +                                           stmts);
> +     }
>       if (loop_vinfo->main_loop_edge)
>       vec_initial_defs[0]
> -       = vect_get_main_loop_result (loop_vinfo, accumulator->reduc_input,
> +       = vect_get_main_loop_result (loop_vinfo, def,
>                                      vec_initial_defs[0]);
>       else
> -     vec_initial_defs.safe_push (accumulator->reduc_input);
> -      gcc_assert (vec_initial_defs.length () == 1);
> +     vec_initial_defs.safe_push (def);
>     }
> 
>   /* Generate the reduction PHIs upfront.  */
> </cut>

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to