pr69719.c fails for fixed-length 128-bit SVE

cvs-commit at gcc dot gnu.org via Gcc-bugs Tue, 05 Jan 2021 03:03:51 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98371


--- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Sandiford <[email protected]>:

https://gcc.gnu.org/g:01be45eccee42d0cc6c900f43e2363186517f7ed

commit r11-6458-g01be45eccee42d0cc6c900f43e2363186517f7ed
Author: Richard Sandiford <[email protected]>
Date:   Tue Jan 5 11:03:22 2021 +0000

    vect: Fix missing alias checks for 128-bit SVE [PR98371]

    On AArch64, the vectoriser tries various ways of vectorising with both
    SVE and Advanced SIMD and picks the best one.  All other things being
    equal, it prefers earlier attempts over later attempts.

    The way this works currently is that, once it has a successful
    vectorisation attempt A, it analyses all other attempts as epilogue
    loops of A:

          /* When pick_lowest_cost_p is true, we should in principle iterate
             over all the loop_vec_infos that LOOP_VINFO could replace and
             try to vectorize LOOP_VINFO under the same conditions.
             E.g. when trying to replace an epilogue loop, we should vectorize
             LOOP_VINFO as an epilogue loop with the same VF limit.  When
trying
             to replace the main loop, we should vectorize LOOP_VINFO as a main
             loop too.

             However, autovectorize_vector_modes is usually sorted as follows:

             - Modes that naturally produce lower VFs usually follow modes that
               naturally produce higher VFs.

             - When modes naturally produce the same VF, maskable modes
               usually follow unmaskable ones, so that the maskable mode
               can be used to vectorize the epilogue of the unmaskable mode.

             This order is preferred because it leads to the maximum
             epilogue vectorization opportunities.  Targets should only use
             a different order if they want to make wide modes available while
             disparaging them relative to earlier, smaller modes.  The
assumption
             in that case is that the wider modes are more expensive in some
             way that isn't reflected directly in the costs.

             There should therefore be few interesting cases in which
             LOOP_VINFO fails when treated as an epilogue loop, succeeds when
             treated as a standalone loop, and ends up being genuinely cheaper
             than FIRST_LOOP_VINFO.  */

    However, the vectoriser can normally elide alias checks for epilogue
    loops, on the basis that the main loop should do them instead.
    Converting an epilogue loop to a main loop can therefore cause the alias
    checks to be skipped.  (It probably also unfairly penalises the original
    loop in the cost comparison, given that one loop will have alias checks
    and the other won't.)

    As the comment says, we should in principle analyse each vector mode
    twice: once as a main loop and once as an epilogue.  However, doing
    that up-front would be quite expensive.  This patch instead goes for a
    compromise: if an epilogue loop for mode M2 seems better than a main
    loop for mode M1, re-analyse with M2 as the main loop.

    The patch fixes dg.torture.exp=pr69719.c when testing with
    -msve-vector-bits=128.

    gcc/
            PR tree-optimization/98371
            * tree-vect-loop.c (vect_reanalyze_as_main_loop): New function.
            (vect_analyze_loop): If an epilogue loop appears to be cheaper
            than the main loop, re-analyze it as a main loop before adopting
            it as a main loop.

[Bug tree-optimization/98371] [10/11 Regression] gcc.dg/torture/pr69719.c fails for fixed-length 128-bit SVE

Reply via email to