https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120927

--- Comment #9 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-15 branch has been updated by Richard Biener
<rgue...@gcc.gnu.org>:

https://gcc.gnu.org/g:b8599692a336b29851bdc5d8506a51d57521595c

commit r15-9940-gb8599692a336b29851bdc5d8506a51d57521595c
Author: Richard Biener <rguent...@suse.de>
Date:   Thu Jul 3 14:39:22 2025 +0200

    tree-optimization/120927 - 510.parest_r segfault with masked epilog

    The following fixes bad alignment computaton for epilog vectorization
    when as in this case for 510.parest_r and masked epilog vectorization
    with AVX512 we end up choosing AVX to vectorize the main loop and
    masked AVX512 (sic!) to vectorize the epilog.  In that case alignment
    analysis for the epilog tries to force alignment of the base to 64,
    but that cannot possibly help the epilog when the main loop had used
    a vector mode with smaller alignment requirement.

    There's another issue, that the check whether the step preserves
    alignment needs to consider possibly previously involved VFs
    (here, the main loops smaller VF) as well.

    These might not be the only case with problems for such a mode mix
    but at least there it seems wise to never use DR alignment forcing
    when analyzing an epilog.

    We get to chose this mode setup because the iteration over epilog
    modes doesn't prevent this, the maybe_ge (cached_vf_per_mode[0],
    first_vinfo_vf) skip is conditional on !supports_partial_vectors
    and it is also conditional on having a cached VF.  Further nothing
    in vect_analyze_loop_1 rejects this setup - it might be conceivable
    that a target can do masking only for larger modes.  There is a
    second reason we end up with this mode setup, which is that
    vect_need_peeling_or_partial_vectors_p says we do not need
    peeling or partial vectors when analyzing the main loop with
    AVX512 (if it would say so we'd have chosen a masked AVX512
    epilog-only vectorization).  It does that because it looks at
    LOOP_VINFO_COST_MODEL_THRESHOLD (which is not yet computed, so
    always zero at this point), and compares max_niter (5) against
    the VF (8), but not with equality as the comment says but with
    greater.  This also needs looking at, PR120939.

            PR tree-optimization/120927
            * tree-vect-data-refs.cc (vect_compute_data_ref_alignment):
            Do not force a DRs base alignment when analyzing an
            epilog loop.  Check whether the step preserves alignment
            for all VFs possibly involved sofar.

            * gcc.dg/vect/vect-pr120927.c: New testcase.
            * gcc.dg/vect/vect-pr120927-2.c: Likewise.

    (cherry picked from commit 918f4517564c2cf7e5bb907428d5413742bee56f)

Reply via email to