[Bug tree-optimization/119351] [15 Regression] Wrong code in GROMACS for AArch64 generic SVE VLS target since r15-6807-g68326d5d1a593d

cvs-commit at gcc dot gnu.org via Gcc-bugs Wed, 16 Apr 2025 05:10:29 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119351


--- Comment #20 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <tnfch...@gcc.gnu.org>:

https://gcc.gnu.org/g:46ccce1de686c1b437eff43431dc20d20d4687c0

commit r15-9518-g46ccce1de686c1b437eff43431dc20d20d4687c0
Author: Tamar Christina <tamar.christ...@arm.com>
Date:   Wed Apr 16 13:09:05 2025 +0100

    middle-end: Fix incorrect codegen with PFA and VLS [PR119351]

    The following example:

    #define N 512
    #define START 2
    #define END 505

    int x[N] __attribute__((aligned(32)));

    int __attribute__((noipa))
    foo (void)
    {
      for (signed int i = START; i < END; ++i)
        {
          if (x[i] == 0)
            return i;
        }
      return -1;
    }

    generates incorrect code with fixed length SVE because for early break we
need
    to know which value to start the scalar loop with if we take an early exit.

    Historically this means that we take the first element of every induction.
    this is because there's an assumption in place, that even with masked loops
the
    masks come from a whilel* instruction.

    As such we reduce using a BIT_FIELD_REF <, 0>.

    When PFA was added this assumption was correct for non-masked loop, however
we
    assumed that PFA for VLA wouldn't work for now, and disabled it using the
    alignment requirement checks.  We also expected VLS to PFA using scalar
loops.

    However as this PR shows, for VLS the vectorizer can, and does in some
    circumstances choose to peel using masks by masking the first iteration of
the
    loop with an additional alignment mask.

    When this is done, the first elements of the predicate can be inactive. In
this
    example element 1 is inactive based on the calculated misalignment.  hence
the
    -1 value in the first vector IV element.

    When we reduce using BIT_FIELD_REF we get the wrong value.

    This patch updates it by creating a new scalar PHI that keeps track of
whether
    we are the first iteration of the loop (with the additional masking) or
whether
    we have taken a loop iteration already.

    The generated sequence:

    pre-header:
      bb1:
        i_1 = <number of leading inactive elements>

    header:
      bb2:
        i_2 = PHI <i_1(bb1), 0(latch)>
        â¦

    early-exit:
      bb3:
        i_3 = iv_step * i_2 + PHI<vector-iv>

    Which eliminates the need to do an expensive mask based reduction.

    This fixes gromacs with one OpenMP thread. But with > 1 there is still an
issue.

    gcc/ChangeLog:

            PR tree-optimization/119351
            * tree-vectorizer.h (LOOP_VINFO_MASK_NITERS_PFA_OFFSET,
            LOOP_VINFO_NON_LINEAR_IV): New.
            (class _loop_vec_info): Add mask_skip_niters_pfa_offset and
            nonlinear_iv.
            * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
them.
            (vect_analyze_scalar_cycles_1): Record non-linear inductions.
            (vectorizable_induction): If early break and PFA using masking
create a
            new phi which tracks where the scalar code needs to start...
            (vectorizable_live_operation): ...and generate the adjustments
here.
            (vect_use_loop_mask_for_alignment_p): Reject non-linear inductions
and
            early break needing peeling.

    gcc/testsuite/ChangeLog:

            PR tree-optimization/119351
            * gcc.target/aarch64/sve/peel_ind_10.c: New test.
            * gcc.target/aarch64/sve/peel_ind_10_run.c: New test.
            * gcc.target/aarch64/sve/peel_ind_5.c: New test.
            * gcc.target/aarch64/sve/peel_ind_5_run.c: New test.
            * gcc.target/aarch64/sve/peel_ind_6.c: New test.
            * gcc.target/aarch64/sve/peel_ind_6_run.c: New test.
            * gcc.target/aarch64/sve/peel_ind_7.c: New test.
            * gcc.target/aarch64/sve/peel_ind_7_run.c: New test.
            * gcc.target/aarch64/sve/peel_ind_8.c: New test.
            * gcc.target/aarch64/sve/peel_ind_8_run.c: New test.
            * gcc.target/aarch64/sve/peel_ind_9.c: New test.
            * gcc.target/aarch64/sve/peel_ind_9_run.c: New test.

[Bug tree-optimization/119351] [15 Regression] Wrong code in GROMACS for AArch64 generic SVE VLS target since r15-6807-g68326d5d1a593d

Reply via email to