Richard Biener <rguent...@suse.de> writes:
> This refactors the main loop analysis part in vect_analyze_loop,
> re-purposing the existing vect_reanalyze_as_main_loop for this
> to reduce code duplication.  Failure flow is a bit tricky since
> we want to extract info from the analyzed loop but I wanted to
> share the destruction part.  Thus I add some std::function and
> lambda to funnel post-analysis for the case we want that
> (when analyzing from the main iteration but not when re-analyzing
> an epilogue as main).

Thanks for cleaning this up.

FWIW, as I mentioned on irc, I think the loop could be simplified quite
a bit if we were prepared to analyse loops both as an epilogue and
(independently) as a main loop.

I think the geology of the code is something like this:

layer 1:
  Original loop that tries fallback vector modes if the autodetected
  one fails.

layer 2:
  Add support for simdlen.  This required continuing after finding
  a match in case a later mode corresponded with the simdlen.

layer 3:
  Add epilogue vinfos.

layer 4:
  Restructure to support layers 5 and 6.

layer 5:
  Add support for multiple vector sizes in a loop.  This needed extra
  code to avoid redundant analysis attempts.

layer 6:
  Add VECT_COMPARE_COSTS (first cut).  At the time this was relatively
  simple [bcc7e346bf9b5dc77797ea949d6adc740deb30ca] since it just meant
  tweaking the “continuing” condition from (2).

  However, a (deliberate) wart was that it only tried treating each
  mode as a replacement for the loop_vinfo at the end of the current
  list (if the main loop is the head of the list and epilogues follow).

  This was supposed to be a compile-time improvement, since it meant
  we still only analysed with each mode once.

layer 7:
  Reanalyze a replacement epilogue loop as a main loop before comparing
  it with the existing main loop.  This prevented a wrong code bug but
  defeated part of the compile-time optimisation from (6).

So it's already necessary to analyse a loop as both an epilogue loop
and a main loop in some cases.

The requirement to analyse loops only once also prevents us from being
able to vectorise the epilogue of an omp simdlen loop, because for
something like -mpreferred-vector-width=256, we'd try AVX256 before
AVX512, even if the simdlen forced AVX512.

> I realize this probably doesn't help the unroll case yet, but it
> looked like an improvement.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> OK?
>
> Thanks,
> Richard.
>
> 2021-10-27  Richard Biener  <rguent...@suse.de>
>
>       * tree-vect-loop.c: Include <functional>.
>       (vect_reanalyze_as_main_loop): Rename to...
>       (vect_analyze_loop_1): ... this and generalize to be
>       able to use it twice ...
>       (vect_analyze_loop): ... here.
> ---
>  gcc/tree-vect-loop.c | 202 ++++++++++++++++++++++---------------------
>  1 file changed, 102 insertions(+), 100 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 961c1623f81..9a62475a69f 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -20,6 +20,7 @@ along with GCC; see the file COPYING3.  If not see
>  <http://www.gnu.org/licenses/>.  */
>  
>  #define INCLUDE_ALGORITHM
> +#define INCLUDE_FUNCTIONAL
>  #include "config.h"
>  #include "system.h"
>  #include "coretypes.h"
> @@ -2898,43 +2899,63 @@ vect_joust_loop_vinfos (loop_vec_info new_loop_vinfo,
>    return true;
>  }
>  
> -/* If LOOP_VINFO is already a main loop, return it unmodified.  Otherwise
> -   try to reanalyze it as a main loop.  Return the loop_vinfo on success
> -   and null on failure.  */
> +/* Analyze LOOP with VECTOR_MODE and as epilogue if MAIN_LOOP_VINFO is
> +   not NULL.  Process the analyzed loop with PROCESS even if analysis
> +   failed.  Sets *N_STMTS and FATAL according to the analysis.
> +   Return the loop_vinfo on success and wrapped null on failure.  */
>  
> -static loop_vec_info
> -vect_reanalyze_as_main_loop (loop_vec_info loop_vinfo, unsigned int *n_stmts)
> +static opt_loop_vec_info
> +vect_analyze_loop_1 (class loop *loop, vec_info_shared *shared,
> +                  machine_mode vector_mode, loop_vec_info main_loop_vinfo,
> +                  unsigned int *n_stmts, bool &fatal,
> +                  std::function<void(loop_vec_info)> process = nullptr)
>  {
> -  if (!LOOP_VINFO_EPILOGUE_P (loop_vinfo))
> -    return loop_vinfo;
> +  /* Check the CFG characteristics of the loop (nesting, entry/exit).  */
> +  opt_loop_vec_info loop_vinfo = vect_analyze_loop_form (loop, shared);
> +  if (!loop_vinfo)
> +    {
> +      if (dump_enabled_p ())
> +     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                      "bad loop form.\n");
> +      gcc_checking_assert (main_loop_vinfo == NULL);
> +      return loop_vinfo;
> +    }
> +  loop_vinfo->vector_mode = vector_mode;
>  
> -  if (dump_enabled_p ())
> -    dump_printf_loc (MSG_NOTE, vect_location,
> -                  "***** Reanalyzing as a main loop with vector mode %s\n",
> -                  GET_MODE_NAME (loop_vinfo->vector_mode));
> +  if (main_loop_vinfo)
> +    LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = main_loop_vinfo;
>  
> -  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> -  vec_info_shared *shared = loop_vinfo->shared;
> -  opt_loop_vec_info main_loop_vinfo = vect_analyze_loop_form (loop, shared);
> -  gcc_assert (main_loop_vinfo);
> +  /* Run the main analysis.  */
> +  fatal = false;

Think this should be at the top, since we have an early return above.
The early return should be fatal.

> +  opt_result res = vect_analyze_loop_2 (loop_vinfo, fatal, n_stmts);
> +  loop->aux = NULL;
>  
> -  main_loop_vinfo->vector_mode = loop_vinfo->vector_mode;
> +  /* Process info before we destroy loop_vinfo upon analysis failure.  */
> +  if (process)
> +    process (loop_vinfo);
>  
> -  bool fatal = false;
> -  bool res = vect_analyze_loop_2 (main_loop_vinfo, fatal, n_stmts);
> -  loop->aux = NULL;
> -  if (!res)
> +  if (dump_enabled_p ())
>      {
> -      if (dump_enabled_p ())
> +      if (res)
>       dump_printf_loc (MSG_NOTE, vect_location,
> -                      "***** Failed to analyze main loop with vector"
> -                      " mode %s\n",
> +                      "***** Analysis succeeded with vector mode %s\n",
>                        GET_MODE_NAME (loop_vinfo->vector_mode));
> -      delete main_loop_vinfo;
> -      return NULL;
> +      else
> +     dump_printf_loc (MSG_NOTE, vect_location,
> +                      "***** Analysis failed with vector mode %s\n",
> +                      GET_MODE_NAME (loop_vinfo->vector_mode));
> +    }
> +
> +  if (!res)
> +    {
> +      delete loop_vinfo;
> +      if (fatal)
> +     gcc_checking_assert (main_loop_vinfo == NULL);
> +      return opt_loop_vec_info::propagate_failure (res);
>      }
> -  LOOP_VINFO_VECTORIZABLE_P (main_loop_vinfo) = 1;
> -  return main_loop_vinfo;
> +
> +  LOOP_VINFO_VECTORIZABLE_P (loop_vinfo) = 1;
> +  return loop_vinfo;
>  }
>  
>  /* Function vect_analyze_loop.
> @@ -2981,20 +3002,6 @@ vect_analyze_loop (class loop *loop, vec_info_shared 
> *shared)
>    unsigned HOST_WIDE_INT simdlen = loop->simdlen;
>    while (1)
>      {
> -      /* Check the CFG characteristics of the loop (nesting, entry/exit).  */
> -      opt_loop_vec_info loop_vinfo = vect_analyze_loop_form (loop, shared);
> -      if (!loop_vinfo)
> -     {
> -       if (dump_enabled_p ())
> -         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                          "bad loop form.\n");
> -       gcc_checking_assert (first_loop_vinfo == NULL);
> -       return loop_vinfo;
> -     }
> -      loop_vinfo->vector_mode = next_vector_mode;
> -
> -      bool fatal = false;
> -
>        /* When pick_lowest_cost_p is true, we should in principle iterate
>        over all the loop_vec_infos that LOOP_VINFO could replace and
>        try to vectorize LOOP_VINFO under the same conditions.
> @@ -3023,41 +3030,35 @@ vect_analyze_loop (class loop *loop, vec_info_shared 
> *shared)
>        LOOP_VINFO fails when treated as an epilogue loop, succeeds when
>        treated as a standalone loop, and ends up being genuinely cheaper
>        than FIRST_LOOP_VINFO.  */
> -      if (vect_epilogues)
> -     LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = first_loop_vinfo;
>  
> -      res = vect_analyze_loop_2 (loop_vinfo, fatal, &n_stmts);
> -      if (mode_i == 0)
> -     autodetected_vector_mode = loop_vinfo->vector_mode;
> -      if (dump_enabled_p ())
> +      bool fatal;
> +      auto cb = [&] (loop_vec_info loop_vinfo)
>       {
> -       if (res)
> -         dump_printf_loc (MSG_NOTE, vect_location,
> -                          "***** Analysis succeeded with vector mode %s\n",
> -                          GET_MODE_NAME (loop_vinfo->vector_mode));
> -       else
> -         dump_printf_loc (MSG_NOTE, vect_location,
> -                          "***** Analysis failed with vector mode %s\n",
> -                          GET_MODE_NAME (loop_vinfo->vector_mode));
> -     }
> -
> -      loop->aux = NULL;
> -
> -      if (!fatal)
> -     while (mode_i < vector_modes.length ()
> -            && vect_chooses_same_modes_p (loop_vinfo, vector_modes[mode_i]))
> -       {
> -         if (dump_enabled_p ())
> -           dump_printf_loc (MSG_NOTE, vect_location,
> -                            "***** The result for vector mode %s would"
> -                            " be the same\n",
> -                            GET_MODE_NAME (vector_modes[mode_i]));
> -         mode_i += 1;
> -       }
> +       if (mode_i ==0)
> +         autodetected_vector_mode = loop_vinfo->vector_mode;
> +       if (!fatal)
> +         while (mode_i < vector_modes.length ()
> +                && vect_chooses_same_modes_p (loop_vinfo,
> +                                              vector_modes[mode_i]))
> +           {
> +             if (dump_enabled_p ())
> +               dump_printf_loc (MSG_NOTE, vect_location,
> +                                "***** The result for vector mode %s would"
> +                                " be the same\n",
> +                                GET_MODE_NAME (vector_modes[mode_i]));
> +             mode_i += 1;
> +           }
> +     };
> +      opt_loop_vec_info loop_vinfo
> +     = vect_analyze_loop_1 (loop, shared, next_vector_mode,
> +                            vect_epilogues
> +                            ? (loop_vec_info)first_loop_vinfo : NULL,
> +                            &n_stmts, fatal, cb);
> +      if (fatal)
> +     break;
>  
> -      if (res)
> +      if (loop_vinfo)
>       {
> -       LOOP_VINFO_VECTORIZABLE_P (loop_vinfo) = 1;
>         vectorized_loops++;
>  
>         /* Once we hit the desired simdlen for the first time,
> @@ -3084,33 +3085,44 @@ vect_analyze_loop (class loop *loop, vec_info_shared 
> *shared)
>             if (vinfos.is_empty ()
>                 && vect_joust_loop_vinfos (loop_vinfo, first_loop_vinfo))
>               {
> -               loop_vec_info main_loop_vinfo
> -                 = vect_reanalyze_as_main_loop (loop_vinfo, &n_stmts);
> -               if (main_loop_vinfo == loop_vinfo)
> -                 {
> -                   delete first_loop_vinfo;
> -                   first_loop_vinfo = opt_loop_vec_info::success (NULL);
> -                 }
> -               else if (main_loop_vinfo
> -                        && vect_joust_loop_vinfos (main_loop_vinfo,
> -                                                   first_loop_vinfo))
> +               if (!vect_epilogues)

!vect_epilogues is correct for current uses, but I think the original
!LOOP_VINFO_EPILOGUE_P (loop_vinfo) was more general.  As mentioned above,
in principle there's no reason why we couldn't reanalyse a loop as a
main loop if we fail to analyse it as an epilogue.

Richard

>                   {
>                     delete first_loop_vinfo;
>                     first_loop_vinfo = opt_loop_vec_info::success (NULL);
> -                   delete loop_vinfo;
> -                   loop_vinfo
> -                     = opt_loop_vec_info::success (main_loop_vinfo);
>                   }
>                 else
>                   {
>                     if (dump_enabled_p ())
>                       dump_printf_loc (MSG_NOTE, vect_location,
> -                                      "***** No longer preferring vector"
> -                                      " mode %s after reanalyzing the loop"
> -                                      " as a main loop\n",
> +                                      "***** Reanalyzing as a main loop "
> +                                      "with vector mode %s\n",
>                                        GET_MODE_NAME
> -                                        (main_loop_vinfo->vector_mode));
> -                   delete main_loop_vinfo;
> +                                        (loop_vinfo->vector_mode));
> +                   opt_loop_vec_info main_loop_vinfo
> +                     = vect_analyze_loop_1 (loop, shared,
> +                                            loop_vinfo->vector_mode,
> +                                            NULL, &n_stmts, fatal);
> +                   if (main_loop_vinfo
> +                       && vect_joust_loop_vinfos (main_loop_vinfo,
> +                                                  first_loop_vinfo))
> +                     {
> +                       delete first_loop_vinfo;
> +                       first_loop_vinfo = opt_loop_vec_info::success (NULL);
> +                       delete loop_vinfo;
> +                       loop_vinfo
> +                         = opt_loop_vec_info::success (main_loop_vinfo);
> +                     }
> +                   else
> +                     {
> +                       if (dump_enabled_p ())
> +                         dump_printf_loc (MSG_NOTE, vect_location,
> +                                          "***** No longer preferring vector"
> +                                          " mode %s after reanalyzing the "
> +                                          " loop as a main loop\n",
> +                                          GET_MODE_NAME
> +                                            (loop_vinfo->vector_mode));
> +                       delete main_loop_vinfo;
> +                     }
>                   }
>               }
>           }
> @@ -3159,16 +3171,6 @@ vect_analyze_loop (class loop *loop, vec_info_shared 
> *shared)
>         if (!simdlen && !vect_epilogues && !pick_lowest_cost_p)
>           break;
>       }
> -      else
> -     {
> -       delete loop_vinfo;
> -       loop_vinfo = opt_loop_vec_info::success (NULL);
> -       if (fatal)
> -         {
> -           gcc_checking_assert (first_loop_vinfo == NULL);
> -           break;
> -         }
> -     }
>  
>        /* Handle the case that the original loop can use partial
>        vectorization, but want to only adopt it for the epilogue.

Reply via email to