https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116855

--- Comment #11 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <tnfch...@gcc.gnu.org>:

https://gcc.gnu.org/g:2427793af1e545506e0315f8ec06adf73d76b3cc

commit r15-7886-g2427793af1e545506e0315f8ec06adf73d76b3cc
Author: Tamar Christina <tamar.christ...@arm.com>
Date:   Fri Mar 7 13:46:41 2025 +0000

    middle-end: delay checking for alignment to load [PR118464]

    This fixes two PRs on Early break vectorization by delaying the safety
checks to
    vectorizable_load when the VF, VMAT and vectype are all known.

    This patch does add two new restrictions:

    1. On LOAD_LANES targets, where the buffer size is known, we reject
non-power
       of two group sizes, as they are unaligned every other iteration and so
may
       cross a page unwittingly.  For those cases require partial masking
support.

    2. On LOAD_LANES targets when the buffer is unknown, we reject
vectorization if
       we cannot peel for alignment, as the alignment requirement is quite
large at
       GROUP_SIZE * vectype_size.  This is unlikely to ever be beneficial so we
       don't support it for now.

    There are other steps documented inside the code itself so that the
reasoning
    is next to the code.

    As a fall-back, when the alignment fails we require partial vector support.

    For VLA targets like SVE return element alignment as the desired vector
    alignment.  This means that the loads are never misaligned and so annoying
it
    won't ever need to peel.

    So what I think needs to happen in GCC 16 is that.

    1. during vect_compute_data_ref_alignment we need to take the max of
       POLY_VALUE_MIN and vector_alignment.

    2. vect_do_peeling define skip_vector when PFA for VLA, and in the guard
add a
       check that ncopies * vectype does not exceed POLY_VALUE_MAX which we use
as a
       proxy for pagesize.

    3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in
       vect_determine_partial_vectors_and_peeling since the first iteration has
to
       be partial. Require LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P otherwise we
have
       to fail to vectorize.

    4. Create a default mask to be used, so that
vect_use_loop_mask_for_alignment_p
       becomes true and we generate the peeled check through loop control for
       partial loops.  From what I can tell this won't work for
       LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling support
at
       all in the compiler.  That would need to be done independently from the
       above.

    In any case, not GCC 15 material so I've kept the WIP patches I have
downstream.

    Bootstrapped Regtested on aarch64-none-linux-gnu,
    arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
    -m32, -m64 and no issues.

    gcc/ChangeLog:

            PR tree-optimization/118464
            PR tree-optimization/116855
            * doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
            * tree-vect-data-refs.cc (vect_analyze_early_break_dependences):
Delay
            checks.
            (vect_compute_data_ref_alignment): Remove alignment checks and move
to
            get_load_store_type, increase group access alignment.
            (vect_enhance_data_refs_alignment): Add note to comment needing
            investigating.
            (vect_analyze_data_refs_alignment): Likewise.
            (vect_supportable_dr_alignment): For group loads look at first DR.
            * tree-vect-stmts.cc (get_load_store_type):
            Perform safety checks for early break pfa.
            * tree-vectorizer.h (dr_set_safe_speculative_read_required,
            dr_safe_speculative_read_required, DR_SCALAR_KNOWN_BOUNDS): New.
            (need_peeling_for_alignment): Renamed to...
            (safe_speculative_read_required): .. This
            (class dr_vec_info): Add scalar_access_known_in_bounds.

    gcc/testsuite/ChangeLog:

            PR tree-optimization/118464
            PR tree-optimization/116855
            * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because
the
            load type is relaxed later.
            * gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
            * gcc.dg/vect/vect-early-break_22.c: Require partial vectors.
            * gcc.dg/vect/vect-early-break_128.c: Likewise.
            * gcc.dg/vect/vect-early-break_26.c: Likewise.
            * gcc.dg/vect/vect-early-break_43.c: Likewise.
            * gcc.dg/vect/vect-early-break_44.c: Likewise.
            * gcc.dg/vect/vect-early-break_2.c: Require load_lanes.
            * gcc.dg/vect/vect-early-break_7.c: Likewise.
            * gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
            * gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
            * gcc.dg/vect/vect-early-break_133_pfa11.c: New test.
            * gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
            * gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
            * gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
            * gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
            * gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
            * gcc.dg/vect/vect-early-break_133_pfa6.c: New test.
            * gcc.dg/vect/vect-early-break_133_pfa7.c: New test.
            * gcc.dg/vect/vect-early-break_133_pfa8.c: New test.
            * gcc.dg/vect/vect-early-break_133_pfa9.c: New test.
            * gcc.dg/vect/vect-early-break_39.c: Update testcase for
misalignment.
            * gcc.dg/vect/vect-early-break_18.c: Likewise.
            * gcc.dg/vect/vect-early-break_20.c: Likewise.
            * gcc.dg/vect/vect-early-break_21.c: Likewise.
            * gcc.dg/vect/vect-early-break_38.c: Likewise.
            * gcc.dg/vect/vect-early-break_6.c: Likewise.
            * gcc.dg/vect/vect-early-break_53.c: Likewise.
            * gcc.dg/vect/vect-early-break_56.c: Likewise.
            * gcc.dg/vect/vect-early-break_57.c: Likewise.
            * gcc.dg/vect/vect-early-break_81.c: Likewise.

Reply via email to