https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111677

--- Comment #26 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Alex Coplan <acop...@gcc.gnu.org>:

https://gcc.gnu.org/g:0529ba8168c89f24314e8750237d77bb132bea9c

commit r14-8657-g0529ba8168c89f24314e8750237d77bb132bea9c
Author: Alex Coplan <alex.cop...@arm.com>
Date:   Tue Jan 30 10:22:48 2024 +0000

    aarch64: Avoid out-of-range shrink-wrapped saves [PR111677]

    The PR shows us ICEing due to an unrecognizable TFmode save emitted by
    aarch64_process_components.  The problem is that for T{I,F,D}mode we
    conservatively require mems to be in range for x-register ldp/stp.  That
    is because (at least for TImode) it can be allocated to both GPRs and
    FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is
    a q-register load/store.

    As Richard pointed out in the PR, aarch64_get_separate_components
    already checks that the offsets are suitable for a single load, so we
    just need to choose a mode in aarch64_reg_save_mode that gives the full
    q-register range.  In this patch, we choose V16QImode as an alternative
    16-byte "bag-of-bits" mode that doesn't have the artificial range
    restrictions imposed on T{I,F,D}mode.

    For T{F,D}mode in GCC 15 I think we could consider relaxing the
    restriction imposed in aarch64_classify_address, as typically T{F,D}mode
    should be allocated to FPRs.  But such a change seems too invasive to
    consider for GCC 14 at this stage (let alone backports).

    Fortunately the new flexible load/store pair patterns in GCC 14 allow
    this mode change to work without further changes.  The backports are
    more involved as we need to adjust the load/store pair handling to cater
    for V16QImode in a few places.

    Note that for the testcase we are relying on the torture options to add
    -funroll-loops at -O3 which is necessary to trigger the ICE on trunk
    (but not on the 13 branch).

    gcc/ChangeLog:

            PR target/111677
            * config/aarch64/aarch64.cc (aarch64_reg_save_mode): Use
            V16QImode for the full 16-byte FPR saves in the vector PCS case.

    gcc/testsuite/ChangeLog:

            PR target/111677
            * gcc.target/aarch64/torture/pr111677.c: New test.

Reply via email to