https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116028

--- Comment #25 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <ja...@gcc.gnu.org>:

https://gcc.gnu.org/g:78592fdbdc18013387d9854805b5cbf6d19c2110

commit r15-8656-g78592fdbdc18013387d9854805b5cbf6d19c2110
Author: Jakub Jelinek <ja...@redhat.com>
Date:   Fri Mar 21 20:26:00 2025 +0100

    lra, v2: emit caller-save register spills before call insn [PR116028]

    Here is an updated version of Surya's PR116028 fix from August, which got
    reverted because it caused bootstrap failures on aarch64, later on
bootstrap
    comparison errors there as well and problems on other targets as well.

    Original description:

    LRA emits insns to save caller-save registers in the
    inheritance/splitting pass.  In this pass, LRA builds EBBs (Extended
    Basic Block) and traverses the insns in the EBBs in reverse order from
    the last insn to the first insn.  When LRA sees a write to a pseudo (that
    has been assigned a caller-save register), and there is a read following
    the write, with an intervening call insn between the write and read,
    then LRA generates a spill immediately after the write and a restore
    immediately before the read.  The spill is needed because the call insn
    will clobber the caller-save register.

    If there is a write insn and a call insn in two separate BBs but
    belonging to the same EBB, the spill insn gets generated in the BB
    containing the write insn.  If the write insn is in the entry BB, then
    the spill insn that is generated in the entry BB prevents shrink wrap
    from happening.  This is because the spill insn references the stack
    pointer and hence the prolog gets generated in the entry BB itself.

    This patch ensures the the spill insn is generated before the call insn
    instead of after the write.  This also ensures that the spill occurs
    only in the path containing the call.

    The changes compared to the first r15-2810 version are:
    1) the reason for aarch64 miscompilations and later on bootstrap comparison
       issues as can be seen on the pr118615.c testcase in the patch was that
       when curr_insn is a JUMP_INSN or some cases of CALL_INSNs,
       split_if_necessary is called with before_p true and if it is successful,
       the code set use_insn = PREV_INSN (curr_insn); instead of use_insn =
       curr_insn; and that use_insn is then what is passed to
       add_next_usage_insn; now, if the patch decides to emit the save
       instruction(s) before the first call after curr_insn in the ebb rather
       than before the JUMP_INSN/CALL_INSN, PREV_INSN (curr_insn) is some
random
       insn before it, not anything related to the split_reg actions.
       If it is e.g. a DEBUG_INSN in one case vs. some unrelated other insn
       otherwise, that can affect further split_reg within the same function
    2) as suggested by Surya in PR118615, it makes no sense to try to change
       behavior if the first call after curr_insn is in the same bb as
curr_insn
    3) split_reg is actually called sometimes from within inherit_in_ebb but
       sometimes from elsewhere; trying to use whatever last call to
       inherit_in_ebb saw last is a sure way to run into wrong-code issues,
       so instead of clearing the rtx var at the start of inherit_in_ebb it is
       now cleared at the end of it
    4) calling the var latest_call_insn was weird, inherit_in_ebb walks the ebb
       backwards, so what the var contains is the first call insn within the
       ebb (after curr_insn)
    5) the patch was using
       lra_process_new_insns (PREV_INSN (latest_call_insn), NULL, save,
                              "Add save<-reg");
       to emit the save insn before latest_call_insn.  That feels quite weird
       given that latest_call_insn has explicit support for adding stuff
       before some insn or after some insn, adding something before some
       insn doesn't really need to be done as addition after PREV_INSN
    6) some formatting nits + new testcase + removal of xfail even on arm32

    Bootstrapped/regtested on x86_64-linux/i686-linux (my usual
    --enable-checking=yes,rtl,extra builds), aarch64-linux (normal default
    bootstrap) and our distro scratch build
    ({x86_64,i686,aarch64,powerpc64le,s390x}-linux --enable-checking=release
    LTO profiledbootstrap/regtest), I think Sam James tested on 32-bit arm
    too.
    On aarch64-linux this results in
    -FAIL: gcc.dg/pr10474.c scan-rtl-dump pro_and_epilogue "Performing
shrink-wrapping"

    I admit I don't know the code well nor understood everything it is doing.

    I have some concerns:
    1) I wonder if there is a guarantee that first_call_insn if non-NULL will
be
       always in between curr_insn and usage_insn when call_save_p; I'd hope
       yes because if usage_insn is before first_call_insn in the ebb,
       presumably it wouldn't need to find call save regs because the range
       wouldn't cross any calls
    2) I wonder whether it wouldn't be better instead of inserting the saves
       before first_call_insn insert it at the start of the bb containing that
       call (after labels of course); emitting it right before a call could
       mislead code looking for argument slot initialization of the call
    3) even when avoiding the use_insn = PREV_INSN (curr_insn);, I wonder
       if it is ok to use use_insn equal to curr_insn rather than the insns
       far later where we actually inserted it, but primarily because I don't
       understand the code much; I think for the !before_p case it is doing
       similar thing  on a shorter distance, the saves were emitted after
       curr_insn and we record it on curr_insn

    2025-03-21  Surya Kumari Jangala  <jskum...@linux.ibm.com>
                Jakub Jelinek  <ja...@redhat.com>

            PR rtl-optimization/116028
            PR rtl-optimization/118615
            * lra-constraints.cc (first_call_insn): New variable.
            (split_reg): Spill register before first_call_insn if call_save_p
            and the call is in a different bb in the ebb.
            (split_if_necessary): Formatting fix.
            (inherit_in_ebb): Set first_call_insn when handling a CALL_INSN.
            For successful split_if_necessary with before_p, only change
            use_insn if it emitted any new instructions before curr_insn.
            Clear first_call_insn before returning.

            * gcc.dg/ira-shrinkwrap-prep-1.c: Remove xfail for powerpc.
            * gcc.dg/pr10474.c: Remove xfail for powerpc and arm.
            * gcc.dg/pr118615.c: New test.

Reply via email to