[gcc r15-3219] final: go down ASHIFT in walk_alter_subreg
https://gcc.gnu.org/g:359209bdc7245f8768b5044acded8509545e4990 commit r15-3219-g359209bdc7245f8768b5044acded8509545e4990 Author: Michael Matz Date: Thu Aug 22 17:03:56 2024 +0200 final: go down ASHIFT in walk_alter_subreg when experimenting with m68k plus LRA one of the changes in the backend is to accept ASHIFTs (not only MULT) as scale code for address indices. When then not turning on LRA but using reload those addresses are presented to it which chokes on them. While reload is going away the change to make them work doesn't really hurt (and generally seems useful, as MULT and ASHIFT really are no different). So just add it. PR target/116413 * final.cc (walk_alter_subreg): Recurse on AHIFT. Diff: --- gcc/final.cc | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/final.cc b/gcc/final.cc index eb9e065d9f0a..5d911586de5b 100644 --- a/gcc/final.cc +++ b/gcc/final.cc @@ -3146,6 +3146,7 @@ walk_alter_subreg (rtx *xp, bool *changed) case PLUS: case MULT: case AND: +case ASHIFT: XEXP (x, 0) = walk_alter_subreg (&XEXP (x, 0), changed); XEXP (x, 1) = walk_alter_subreg (&XEXP (x, 1), changed); break;
[gcc r15-3221] LRA: Fix setup_sp_offset
https://gcc.gnu.org/g:e223ac9c225352e3aeea7180a3b56a96ecdbe2fd commit r15-3221-ge223ac9c225352e3aeea7180a3b56a96ecdbe2fd Author: Michael Matz Date: Thu Aug 22 17:21:42 2024 +0200 LRA: Fix setup_sp_offset This is part of making m68k work with LRA. See PR116429. In short: setup_sp_offset is internally inconsistent. It wants to setup the sp_offset for newly generated instructions. sp_offset for an instruction is always the state of the sp-offset right before that instruction. For that it starts at the (assumed correct) sp_offset of the instruction right after the given (new) sequence, and then iterates that sequence forward simulating its effects on sp_offset. That can't ever be right: either it needs to start at the front and simulate forward, or start at the end and simulate backward. The former seems to be the more natural way. Funnily the local variable holding that instruction is also called 'before'. This changes it to the first variant: start before the sequence, do one simulation step to get the sp-offset state in front of the sequence and then continue simulating. More details: in the problematic testcase we start with this situation (sp_off before 550 is 0): 550: [--sp] = 0 sp_off = 0 {pushexthisi_const} 551: [--sp] = 37sp_off = -4 {pushexthisi_const} 552: [--sp] = r37 sp_off = -8 {movsi_m68k2} 554: [--sp] = r116 - r37sp_off = -12 {subsi3} 556: call sp_off = -16 insn 554 doesn't match its constraints and needs some reloads: Creating newreg=262, assigning class DATA_REGS to r262 554: r262:SI=r262:SI-r37:SI REG_ARGS_SIZE 0x10 Inserting insn reload before: 996: r262:SI=r116:SI Inserting insn reload after: 997: [--%sp:SI]=r262:SI Considering alt=0 of insn 997: (0) =g (1) damSKT 1 Non pseudo reload: reject++ overall=1,losers=0,rld_nregs=0 Choosing alt 0 in insn 997: (0) =g (1) damSKT {*movsi_m68k2} (sp_off=-16) Note how insn 997 (the after-reload) now has sp_off=-16 already. It all goes downhill from there. We end up with these insns: 552: [--sp] = r37 sp_off = -8 {movsi_m68k2} 996: r262 = r116sp_off = -12 554: r262 = r262 - r37 sp_off = -12 997: [--sp] = r262 sp_off = -16 (!!! should be -12) 556: call sp_off = -16 The call insn sp_off remains at the correct -16, but internally it's already inconsistent here. If the sp_off before an insn is -16, and that insn pre_decs sp, then the after-insn sp_off should be -20. PR target/116429 * lra.cc (setup_sp_offset): Start with sp_offset from before the new sequence, not from after. Diff: --- gcc/lra.cc | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/gcc/lra.cc b/gcc/lra.cc index fb32e134004a..b84384b21454 100644 --- a/gcc/lra.cc +++ b/gcc/lra.cc @@ -1863,14 +1863,17 @@ push_insns (rtx_insn *from, rtx_insn *to) } /* Set up and return sp offset for insns in range [FROM, LAST]. The offset is - taken from the next BB insn after LAST or zero if there in such - insn. */ + taken from the BB insn before FROM after simulating its effects, + or zero if there is no such insn. */ static poly_int64 setup_sp_offset (rtx_insn *from, rtx_insn *last) { - rtx_insn *before = next_nonnote_nondebug_insn_bb (last); - poly_int64 offset = (before == NULL_RTX || ! INSN_P (before) - ? 0 : lra_get_insn_recog_data (before)->sp_offset); + rtx_insn *before = prev_nonnote_nondebug_insn_bb (from); + poly_int64 offset = 0; + + if (before && INSN_P (before)) +offset = lra_update_sp_offset (PATTERN (before), + lra_get_insn_recog_data (before)->sp_offset); for (rtx_insn *insn = from; insn != NEXT_INSN (last); insn = NEXT_INSN (insn)) {
[gcc r15-3220] LRA: Don't use 0 as initialization for sp_offset
https://gcc.gnu.org/g:542773888190ef67dca194f4861abab104fa9b5b commit r15-3220-g542773888190ef67dca194f4861abab104fa9b5b Author: Michael Matz Date: Thu Aug 22 17:09:11 2024 +0200 LRA: Don't use 0 as initialization for sp_offset this is part of making m68k work with LRA. See PR116374. m68k has the property that sometimes the elimation offset between %sp and %argptr is zero. During setting up elimination infrastructure it's changes between sp_offset and previous_offset that feed into insns_with_changed_offsets that ultimately will setup looking at the instructions so marked. But the initial values for sp_offset and previous_offset are also zero. So if the targets INITIAL_ELIMINATION_OFFSET (called in update_reg_eliminate) is zero then nothing changes, the instructions in question don't get into the list to consider and the sp_offset tracking goes wrong. Solve this by initializing those member with -1 instead of zero. An initial offset of that value seems very unlikely, as it's in word-sized increments. This then also reveals a problem in eliminate_regs_in_insn where it always uses sp_offset-previous_offset as offset adjustment, even in the first_p pass. That was harmless when previous_offset was uninitialized as zero. But all the other code uses a different idiom of checking for first_p (or rather update_p which is !replace_p&&!first_p), and using sp_offset directly. So use that as well in eliminate_regs_in_insn. PR target/116374 * lra-eliminations.cc (init_elim_table): Use -1 as initializer. (update_reg_eliminate): Accept -1 as not-yet-used marker. (eliminate_regs_in_insn): Use previous_sp_offset only when not first_p. Diff: --- gcc/lra-eliminations.cc | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc index 5bed259cffeb..96772f2904a6 100644 --- a/gcc/lra-eliminations.cc +++ b/gcc/lra-eliminations.cc @@ -969,7 +969,8 @@ eliminate_regs_in_insn (rtx_insn *insn, bool replace_p, bool first_p, if (! replace_p) { if (known_eq (update_sp_offset, 0)) - offset += (ep->offset - ep->previous_offset); + offset += (!first_p + ? ep->offset - ep->previous_offset : ep->offset); if (ep->to_rtx == stack_pointer_rtx) { if (first_p) @@ -1212,7 +1213,7 @@ update_reg_eliminate (bitmap insns_with_changed_offsets) if (lra_dump_file != NULL) fprintf (lra_dump_file, "Using elimination %d to %d now\n", ep1->from, ep1->to); - lra_assert (known_eq (ep1->previous_offset, 0)); + lra_assert (known_eq (ep1->previous_offset, -1)); ep1->previous_offset = ep->offset; } else @@ -1283,7 +1284,7 @@ init_elim_table (void) for (ep = reg_eliminate, ep1 = reg_eliminate_1; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++, ep1++) { - ep->offset = ep->previous_offset = 0; + ep->offset = ep->previous_offset = -1; ep->from = ep1->from; ep->to = ep1->to; value_p = (targetm.can_eliminate (ep->from, ep->to)
[gcc] Created branch 'matz/heads/x86-ssw' in namespace 'refs/users'
The branch 'matz/heads/x86-ssw' was created in namespace 'refs/users' pointing to: c27b30552e6c... gomp: testsuite: improve compatibility of bad-array-section
[gcc(refs/users/matz/heads/x86-ssw)] x86: implement separate shrink wrapping
https://gcc.gnu.org/g:eb94eb73cf3993c1d544e6eb8c4dcb671f215b25 commit eb94eb73cf3993c1d544e6eb8c4dcb671f215b25 Author: Michael Matz Date: Sun Jun 30 03:52:39 2024 +0200 x86: implement separate shrink wrapping Diff: --- gcc/config/i386/i386.cc | 581 +++- gcc/config/i386/i386.h | 2 + 2 files changed, 533 insertions(+), 50 deletions(-) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 4b6b665e5997..33e69e96008d 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -6970,7 +6970,7 @@ ix86_compute_frame_layout (void) } frame->save_regs_using_mov -= TARGET_PROLOGUE_USING_MOVE && m->use_fast_prologue_epilogue; += (TARGET_PROLOGUE_USING_MOVE || flag_shrink_wrap_separate) && m->use_fast_prologue_epilogue; /* Skip return address and error code in exception handler. */ offset = INCOMING_FRAME_SP_OFFSET; @@ -7120,7 +7120,8 @@ ix86_compute_frame_layout (void) /* Size prologue needs to allocate. */ to_allocate = offset - frame->sse_reg_save_offset; - if ((!to_allocate && frame->nregs <= 1) + if ((!to_allocate && frame->nregs <= 1 + && !flag_shrink_wrap_separate) || (TARGET_64BIT && to_allocate >= HOST_WIDE_INT_C (0x8000)) /* If static stack checking is enabled and done with probes, the registers need to be saved before allocating the frame. */ @@ -7417,6 +7418,8 @@ ix86_emit_save_regs (void) int regno; rtx_insn *insn; + gcc_assert (!crtl->shrink_wrapped_separate); + if (!TARGET_APX_PUSH2POP2 || !ix86_can_use_push2pop2 () || cfun->machine->func_type != TYPE_NORMAL) @@ -7589,7 +7592,8 @@ ix86_emit_save_regs_using_mov (HOST_WIDE_INT cfa_offset) for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true, true)) { -ix86_emit_save_reg_using_mov (word_mode, regno, cfa_offset); + if (!cfun->machine->reg_wrapped_separately[regno]) + ix86_emit_save_reg_using_mov (word_mode, regno, cfa_offset); cfa_offset -= UNITS_PER_WORD; } } @@ -7604,7 +7608,8 @@ ix86_emit_save_sse_regs_using_mov (HOST_WIDE_INT cfa_offset) for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) if (SSE_REGNO_P (regno) && ix86_save_reg (regno, true, true)) { - ix86_emit_save_reg_using_mov (V4SFmode, regno, cfa_offset); + if (!cfun->machine->reg_wrapped_separately[regno]) + ix86_emit_save_reg_using_mov (V4SFmode, regno, cfa_offset); cfa_offset -= GET_MODE_SIZE (V4SFmode); } } @@ -9089,6 +9094,7 @@ ix86_expand_prologue (void) = frame.sse_reg_save_offset - frame.reg_save_offset; gcc_assert (int_registers_saved); + gcc_assert (!m->frame_alloc_separately); /* No need to do stack checking as the area will be immediately written. */ @@ -9106,6 +9112,7 @@ ix86_expand_prologue (void) && flag_stack_clash_protection && !ix86_target_stack_probe ()) { + gcc_assert (!m->frame_alloc_separately); ix86_adjust_stack_and_probe (allocate, int_registers_saved, false); allocate = 0; } @@ -9116,6 +9123,7 @@ ix86_expand_prologue (void) { const HOST_WIDE_INT probe_interval = get_probe_interval (); + gcc_assert (!m->frame_alloc_separately); if (STACK_CHECK_MOVING_SP) { if (crtl->is_leaf @@ -9172,9 +9180,16 @@ ix86_expand_prologue (void) else if (!ix86_target_stack_probe () || frame.stack_pointer_offset < CHECK_STACK_LIMIT) { - pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, -GEN_INT (-allocate), -1, -m->fs.cfa_reg == stack_pointer_rtx); + if (!m->frame_alloc_separately) + pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, + GEN_INT (-allocate), -1, + m->fs.cfa_reg == stack_pointer_rtx); + else + { + if (m->fs.cfa_reg == stack_pointer_rtx) + m->fs.cfa_offset -= allocate; + m->fs.sp_offset += allocate; + } } else { @@ -9184,6 +9199,8 @@ ix86_expand_prologue (void) bool eax_live = ix86_eax_live_at_start_p (); bool r10_live = false; + gcc_assert (!m->frame_alloc_separately); + if (TARGET_64BIT) r10_live = (DECL_STATIC_CHAIN (current_function_decl) != 0); @@ -9338,6 +9355,7 @@ ix86_emit_restore_reg_using_pop (rtx reg, bool ppx_p) struct machine_function *m = cfun->machine; rtx_insn *insn = emit_insn (gen_pop (reg, ppx_p)); + gcc_assert (!m->reg_wrapped_separately[REGNO (reg)]); ix86_add_cfa_restore_note (insn, reg, m->fs.sp_offset); m->fs.sp_offset -= UNITS_PER_WORD; @@ -9396,6 +9414,9 @@ ix86_emit_restore_reg_using_pop2 (rtx reg1, rtx reg2, bool ppx_p = false) const int offset = UNITS_PER_WORD * 2; rtx_insn
[gcc(refs/users/matz/heads/x86-ssw)] x86-ssw: don't clobber flags
https://gcc.gnu.org/g:5a9a70a5837aba373e3f36a89943c52e37a19809 commit 5a9a70a5837aba373e3f36a89943c52e37a19809 Author: Michael Matz Date: Tue Jul 9 02:20:10 2024 +0200 x86-ssw: don't clobber flags Diff: --- gcc/config/i386/i386.cc | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 33e69e96008d..734802dbed4f 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -10878,8 +10878,12 @@ ix86_components_for_bb (basic_block bb) } static void -ix86_disqualify_components (sbitmap, edge, sbitmap, bool) +ix86_disqualify_components (sbitmap components, edge e, sbitmap, bool) { + /* If the flags are needed at the start of e->dest then we can't insert + our stack adjustment insns (they default to flag-clobbering add/sub). */ + if (bitmap_bit_p (DF_LIVE_IN (e->dest), FLAGS_REG)) +bitmap_clear_bit (components, SW_FRAME); } static void
[gcc(refs/users/matz/heads/x86-ssw)] x86-ssw: disable if DRAP reg is needed
https://gcc.gnu.org/g:f917195f8a4e1767e89ebb0c875abcbe4dcf97ff commit f917195f8a4e1767e89ebb0c875abcbe4dcf97ff Author: Michael Matz Date: Tue Jul 9 02:37:55 2024 +0200 x86-ssw: disable if DRAP reg is needed Diff: --- gcc/config/i386/i386.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 734802dbed4f..4aa37c2ffeaa 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -10805,7 +10805,8 @@ ix86_get_separate_components (void) sbitmap components; ix86_finalize_stack_frame_flags (); - if (!frame->save_regs_using_mov) + if (!frame->save_regs_using_mov + || crtl->drap_reg) return NULL; components = sbitmap_alloc (NCOMPONENTS);
[gcc(refs/users/matz/heads/x86-ssw)] x86-ssw: fix testcases
https://gcc.gnu.org/g:c5a72cc80939e42518f4021e0640d29c8b8495a7 commit c5a72cc80939e42518f4021e0640d29c8b8495a7 Author: Michael Matz Date: Tue Jul 9 04:27:46 2024 +0200 x86-ssw: fix testcases the separate-shrink-wrap infrastructure sometimes considers components as handled when they aren't in fact handled (e.g. never calling any emit_prologue_components or emit_epilogue_components hooks for the component in question). So track stuff ourselves. Diff: --- gcc/config/i386/i386.cc | 34 -- gcc/config/i386/i386.h | 1 + 2 files changed, 21 insertions(+), 14 deletions(-) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 4aa37c2ffeaa..23226d204a09 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -6970,7 +6970,7 @@ ix86_compute_frame_layout (void) } frame->save_regs_using_mov -= (TARGET_PROLOGUE_USING_MOVE || flag_shrink_wrap_separate) && m->use_fast_prologue_epilogue; += (TARGET_PROLOGUE_USING_MOVE /*|| flag_shrink_wrap_separate*/) && m->use_fast_prologue_epilogue; /* Skip return address and error code in exception handler. */ offset = INCOMING_FRAME_SP_OFFSET; @@ -7121,7 +7121,7 @@ ix86_compute_frame_layout (void) to_allocate = offset - frame->sse_reg_save_offset; if ((!to_allocate && frame->nregs <= 1 - && !flag_shrink_wrap_separate) + /*&& !flag_shrink_wrap_separate*/) || (TARGET_64BIT && to_allocate >= HOST_WIDE_INT_C (0x8000)) /* If static stack checking is enabled and done with probes, the registers need to be saved before allocating the frame. */ @@ -7418,7 +7418,7 @@ ix86_emit_save_regs (void) int regno; rtx_insn *insn; - gcc_assert (!crtl->shrink_wrapped_separate); + gcc_assert (!cfun->machine->anything_separately); if (!TARGET_APX_PUSH2POP2 || !ix86_can_use_push2pop2 () @@ -8974,7 +8974,7 @@ ix86_expand_prologue (void) if (!int_registers_saved) { /* If saving registers via PUSH, do so now. */ - if (!frame.save_regs_using_mov) + if (!frame.save_regs_using_mov && !m->anything_separately) { ix86_emit_save_regs (); int_registers_saved = true; @@ -9489,7 +9489,7 @@ ix86_emit_restore_regs_using_pop (bool ppx_p) { unsigned int regno; - gcc_assert (!crtl->shrink_wrapped_separate); + gcc_assert (!cfun->machine->anything_separately); for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, false, true)) ix86_emit_restore_reg_using_pop (gen_rtx_REG (word_mode, regno), ppx_p); @@ -9506,7 +9506,7 @@ ix86_emit_restore_regs_using_pop2 (void) int loaded_regnum = 0; bool aligned = cfun->machine->fs.sp_offset % 16 == 0; - gcc_assert (!crtl->shrink_wrapped_separate); + gcc_assert (!cfun->machine->anything_separately); for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, false, true)) { @@ -9894,7 +9894,7 @@ ix86_expand_epilogue (int style) /* EH_RETURN requires the use of moves to function properly. */ if (crtl->calls_eh_return) restore_regs_via_mov = true; - else if (crtl->shrink_wrapped_separate) + else if (m->anything_separately) { gcc_assert (!TARGET_SEH); restore_regs_via_mov = true; @@ -10800,13 +10800,14 @@ separate_frame_alloc_p (void) static sbitmap ix86_get_separate_components (void) { - struct machine_function *m = cfun->machine; - struct ix86_frame *frame = &m->frame; + //struct machine_function *m = cfun->machine; + //struct ix86_frame *frame = &m->frame; sbitmap components; ix86_finalize_stack_frame_flags (); - if (!frame->save_regs_using_mov - || crtl->drap_reg) + if (/*!frame->save_regs_using_mov + ||*/ crtl->drap_reg + || cfun->machine->func_type != TYPE_NORMAL) return NULL; components = sbitmap_alloc (NCOMPONENTS); @@ -11150,6 +11151,8 @@ ix86_process_components (sbitmap components, bool prologue_p) { if (bitmap_bit_p (components, regno)) { + m->reg_wrapped_separately[regno] = true; + m->anything_separately = true; if (prologue_p) ix86_emit_save_reg_using_mov (word_mode, regno, cfa_offset); else @@ -11161,6 +11164,8 @@ ix86_process_components (sbitmap components, bool prologue_p) { if (bitmap_bit_p (components, regno)) { + m->reg_wrapped_separately[regno] = true; + m->anything_separately = true; if (prologue_p) ix86_emit_save_reg_using_mov (V4SFmode, regno, sse_cfa_offset); else @@ -11181,6 +11186,7 @@ ix86_emit_prologue_components (sbitmap components) if (bitmap_bit_p (components, SW_FRAME)) { cfun->machine->frame_alloc_separately = true; + cfun->machine->anything_separately = true; ix86_alloc_frame (); } } @@ -111
[gcc(refs/users/matz/heads/x86-ssw)] x86-ssw: adjust testcase
https://gcc.gnu.org/g:cf6d794219dd0cf2ca3601e2d6e6b9e5f497a47a commit cf6d794219dd0cf2ca3601e2d6e6b9e5f497a47a Author: Michael Matz Date: Tue Jul 9 06:01:22 2024 +0200 x86-ssw: adjust testcase Diff: --- gcc/testsuite/gcc.target/x86_64/abi/callabi/leaf-2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/x86_64/abi/callabi/leaf-2.c b/gcc/testsuite/gcc.target/x86_64/abi/callabi/leaf-2.c index 2a54bc89cfc2..140389626659 100644 --- a/gcc/testsuite/gcc.target/x86_64/abi/callabi/leaf-2.c +++ b/gcc/testsuite/gcc.target/x86_64/abi/callabi/leaf-2.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -mabi=sysv" } */ +/* { dg-options "-O2 -mabi=sysv -fno-shrink-wrap-separate" } */ extern int glb1, gbl2, gbl3;
[gcc(refs/users/matz/heads/x86-ssw)] x86-ssw: precise using of moves
https://gcc.gnu.org/g:d213bc5e67d903143608e0a7879c2577c33ca47e commit d213bc5e67d903143608e0a7879c2577c33ca47e Author: Michael Matz Date: Tue Jul 9 06:01:47 2024 +0200 x86-ssw: precise using of moves we need to differ between merely not wanting to use moves and not being able to. When the allocated frame is too large we can't use moves freely and hence need to disable separate shrink wrapping. If we don't want to use moves by default for speed or the like but nothing else prevents them then this is no reason to disable separate shrink wrapping. Diff: --- gcc/config/i386/i386.cc | 20 +++- gcc/config/i386/i386.h | 1 + 2 files changed, 12 insertions(+), 9 deletions(-) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 23226d204a09..20f4dcd61870 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -7120,9 +7120,7 @@ ix86_compute_frame_layout (void) /* Size prologue needs to allocate. */ to_allocate = offset - frame->sse_reg_save_offset; - if ((!to_allocate && frame->nregs <= 1 - /*&& !flag_shrink_wrap_separate*/) - || (TARGET_64BIT && to_allocate >= HOST_WIDE_INT_C (0x8000)) + if ((TARGET_64BIT && to_allocate >= HOST_WIDE_INT_C (0x8000)) /* If static stack checking is enabled and done with probes, the registers need to be saved before allocating the frame. */ || flag_stack_check == STATIC_BUILTIN_STACK_CHECK @@ -7135,6 +7133,12 @@ ix86_compute_frame_layout (void) || (flag_stack_clash_protection && !ix86_target_stack_probe () && to_allocate > get_probe_interval ())) +{ + frame->cannot_use_moves = true; +} + + if ((!to_allocate && frame->nregs <= 1) + || frame->cannot_use_moves) frame->save_regs_using_mov = false; if (ix86_using_red_zone () @@ -10800,13 +10804,13 @@ separate_frame_alloc_p (void) static sbitmap ix86_get_separate_components (void) { - //struct machine_function *m = cfun->machine; - //struct ix86_frame *frame = &m->frame; + struct machine_function *m = cfun->machine; + struct ix86_frame *frame = &m->frame; sbitmap components; ix86_finalize_stack_frame_flags (); - if (/*!frame->save_regs_using_mov - ||*/ crtl->drap_reg + if (frame->cannot_use_moves + || crtl->drap_reg || cfun->machine->func_type != TYPE_NORMAL) return NULL; @@ -10868,9 +10872,7 @@ ix86_components_for_bb (basic_block bb) { need_frame = true; break; - } - } } if (need_frame) diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index dd73687a8e2c..bda3d97ab4cf 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -2645,6 +2645,7 @@ struct GTY(()) ix86_frame /* When save_regs_using_mov is set, emit prologue using move instead of push instructions. */ bool save_regs_using_mov; + bool cannot_use_moves; /* Assume without checking that: EXPENSIVE_P = expensive_function_p (EXPENSIVE_COUNT). */
[gcc(refs/users/matz/heads/x86-ssw)] x86-ssw: Adjust testcase
https://gcc.gnu.org/g:495a687dc93a58110076700f48fb57fa79026bef commit 495a687dc93a58110076700f48fb57fa79026bef Author: Michael Matz Date: Tue Jul 9 14:26:31 2024 +0200 x86-ssw: Adjust testcase this testcase tries to (uselessly) shrink wrap frame allocation in f0(), and then calls the prologue expander twice emitting the messages looked for with the dejagnu directives more times than expected. Just disable separate shrink wrapping here. Diff: --- gcc/testsuite/gcc.dg/stack-check-5.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/stack-check-5.c b/gcc/testsuite/gcc.dg/stack-check-5.c index 0243147939c1..b93dabdaea1d 100644 --- a/gcc/testsuite/gcc.dg/stack-check-5.c +++ b/gcc/testsuite/gcc.dg/stack-check-5.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fstack-clash-protection -fdump-rtl-pro_and_epilogue -fno-optimize-sibling-calls --param stack-clash-protection-probe-interval=12 --param stack-clash-protection-guard-size=12" } */ +/* { dg-options "-O2 -fstack-clash-protection -fno-shrink-wrap-separate -fdump-rtl-pro_and_epilogue -fno-optimize-sibling-calls --param stack-clash-protection-probe-interval=12 --param stack-clash-protection-guard-size=12" } */ /* { dg-require-effective-target supports_stack_clash_protection } */ /* { dg-skip-if "" { *-*-* } { "-fstack-protector*" } { "" } } */
[gcc(refs/users/matz/heads/x86-ssw)] x86-ssw: tidy and commentary
https://gcc.gnu.org/g:4e6291b6aa5c2033a36e0ac92077a55471e64f92 commit 4e6291b6aa5c2033a36e0ac92077a55471e64f92 Author: Michael Matz Date: Tue Jul 9 17:27:37 2024 +0200 x86-ssw: tidy and commentary Diff: --- gcc/config/i386/i386.cc | 310 gcc/config/i386/i386.h | 1 + 2 files changed, 101 insertions(+), 210 deletions(-) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 20f4dcd61870..8c9505d53a75 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -6970,7 +6970,7 @@ ix86_compute_frame_layout (void) } frame->save_regs_using_mov -= (TARGET_PROLOGUE_USING_MOVE /*|| flag_shrink_wrap_separate*/) && m->use_fast_prologue_epilogue; += TARGET_PROLOGUE_USING_MOVE && m->use_fast_prologue_epilogue; /* Skip return address and error code in exception handler. */ offset = INCOMING_FRAME_SP_OFFSET; @@ -7133,9 +7133,7 @@ ix86_compute_frame_layout (void) || (flag_stack_clash_protection && !ix86_target_stack_probe () && to_allocate > get_probe_interval ())) -{ - frame->cannot_use_moves = true; -} +frame->cannot_use_moves = true; if ((!to_allocate && frame->nregs <= 1) || frame->cannot_use_moves) @@ -9190,6 +9188,11 @@ ix86_expand_prologue (void) m->fs.cfa_reg == stack_pointer_rtx); else { + /* Even when shrink-wrapping separately we call emit_prologue +which will reset the frame-state with the expectation that +we leave this routine with the state valid for the normal +body of the function, i.e. reflecting allocated frame. +So track this by hand. */ if (m->fs.cfa_reg == stack_pointer_rtx) m->fs.cfa_offset -= allocate; m->fs.sp_offset += allocate; @@ -10786,9 +10789,17 @@ ix86_live_on_entry (bitmap regs) } /* Separate shrink-wrapping. */ + +/* On x86 we have one component for each hardreg (a component is handled + if it's a callee saved register), and one additional component for + the frame allocation. */ + #define NCOMPONENTS (FIRST_PSEUDO_REGISTER + 1) #define SW_FRAME FIRST_PSEUDO_REGISTER +/* Returns false when we can't allocate the frame as a separate + component. Otherwise return true. */ + static bool separate_frame_alloc_p (void) { @@ -10801,12 +10812,17 @@ separate_frame_alloc_p (void) return true; } +/* Implements TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS. + Returns an sbitmap with all components that we intend to possibly + handle for the current function. */ + static sbitmap ix86_get_separate_components (void) { struct machine_function *m = cfun->machine; struct ix86_frame *frame = &m->frame; sbitmap components; + unsigned min, max; ix86_finalize_stack_frame_flags (); if (frame->cannot_use_moves @@ -10814,24 +10830,42 @@ ix86_get_separate_components (void) || cfun->machine->func_type != TYPE_NORMAL) return NULL; + min = max = INVALID_REGNUM; + components = sbitmap_alloc (NCOMPONENTS); bitmap_clear (components); for (unsigned regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) if (ix86_save_reg (regno, true, true)) { + if (min == INVALID_REGNUM) + min = regno; + max = regno; bitmap_set_bit (components, regno); } + if (max >= FIRST_PSEUDO_REGISTER) +{ + sbitmap_free (components); + return NULL; +} + + m->ssw_min_reg = min; + m->ssw_max_reg = max; + if (separate_frame_alloc_p ()) bitmap_set_bit (components, SW_FRAME); return components; } +/* Implements TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB. Given a BB + return all components that are necessary for it. */ + static sbitmap ix86_components_for_bb (basic_block bb) { + struct machine_function *m = cfun->machine; bool need_frame = false; sbitmap components = sbitmap_alloc (NCOMPONENTS); bitmap_clear (components); @@ -10840,7 +10874,7 @@ ix86_components_for_bb (basic_block bb) bitmap gen = &DF_LIVE_BB_INFO (bb)->gen; bitmap kill = &DF_LIVE_BB_INFO (bb)->kill; - for (unsigned regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) + for (unsigned regno = m->ssw_min_reg; regno <= m->ssw_max_reg; regno++) if (ix86_save_reg (regno, true, true) && (bitmap_bit_p (in, regno) || bitmap_bit_p (gen, regno) @@ -10881,6 +10915,9 @@ ix86_components_for_bb (basic_block bb) return components; } +/* Implements TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS. Filter out + from COMPONENTS those that we can't handle on edge E. */ + static void ix86_disqualify_components (sbitmap components, edge e, sbitmap, bool) { @@ -10890,6 +10927,10 @@ ix86_disqualify_components (sbitmap components, edge e, sbitmap, bool) bitmap_clear_bit (components, SW_FRAME); } +/* Helper for frame allocation. This resets cfun->machine->fs to + reflect the state at the first
[gcc(refs/users/matz/heads/x86-ssw)] Add target hook shrink_wrap.cleanup_components
https://gcc.gnu.org/g:826dd85cb9f368608a9890046cd701f7530d7315 commit 826dd85cb9f368608a9890046cd701f7530d7315 Author: Michael Matz Date: Wed Jul 10 17:10:18 2024 +0200 Add target hook shrink_wrap.cleanup_components when the shrink wrapping infrastructure removed components the target might need to remove even more for dependency reasons. x86 for instance needs to remove the frame-allocation component when some register components are removed. Diff: --- gcc/config/i386/i386.cc | 17 + gcc/doc/tm.texi | 8 gcc/doc/tm.texi.in | 2 ++ gcc/shrink-wrap.cc | 10 ++ gcc/target.def | 10 ++ 5 files changed, 47 insertions(+) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 8c9505d53a75..36202b7dcb07 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -10927,6 +10927,21 @@ ix86_disqualify_components (sbitmap components, edge e, sbitmap, bool) bitmap_clear_bit (components, SW_FRAME); } +/* Implements TARGET_SHRINK_WRAP_CLEANUP_COMPONENTS. The infrastructure + has removed some components (noted in REMOVED), this cleans out any + further components that can't be shrink wrapped separately + anymore. */ + +static void +ix86_cleanup_components (sbitmap components, sbitmap removed) +{ + /* If separate shrink wrapping removed any register components + then we must also removed SW_FRAME. */ + bitmap_clear_bit (removed, SW_FRAME); + if (!bitmap_empty_p (removed)) +bitmap_clear_bit (components, SW_FRAME); +} + /* Helper for frame allocation. This resets cfun->machine->fs to reflect the state at the first instruction before prologue (i.e. the call just happened). */ @@ -11107,6 +11122,8 @@ ix86_set_handled_components (sbitmap) #define TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB ix86_components_for_bb #undef TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS #define TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS ix86_disqualify_components +#undef TARGET_SHRINK_WRAP_CLEANUP_COMPONENTS +#define TARGET_SHRINK_WRAP_CLEANUP_COMPONENTS ix86_cleanup_components #undef TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS #define TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS ix86_emit_prologue_components #undef TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index c8b8b126b242..201c8b9f94da 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -5352,6 +5352,14 @@ components in @var{edge_components} that the target cannot handle on edge epilogue instead. @end deftypefn +@deftypefn {Target Hook} void TARGET_SHRINK_WRAP_CLEANUP_COMPONENTS (sbitmap @var{components}, sbitmap @var{removed}) +This hook is called after the shrink wrapping infrastructure disqualified +components for various reasons (e.g. because an unsplittable edge would +have to be split). If there are interdependencies between components the +target can remove those from @var{components} whose dependencies are in +@var{removed}. If this hook would do nothing it doesn't need to be defined. +@end deftypefn + @deftypefn {Target Hook} void TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS (sbitmap) Emit prologue insns for the components indicated by the parameter. @end deftypefn diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 658e1e63371e..f23e6ff3e455 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -3787,6 +3787,8 @@ generic code. @hook TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS +@hook TARGET_SHRINK_WRAP_CLEANUP_COMPONENTS + @hook TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS @hook TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS diff --git a/gcc/shrink-wrap.cc b/gcc/shrink-wrap.cc index 2bec492c2a57..db5c1f24d11c 100644 --- a/gcc/shrink-wrap.cc +++ b/gcc/shrink-wrap.cc @@ -1432,6 +1432,9 @@ disqualify_problematic_components (sbitmap components) { auto_sbitmap pro (SBITMAP_SIZE (components)); auto_sbitmap epi (SBITMAP_SIZE (components)); + auto_sbitmap old (SBITMAP_SIZE (components)); + + bitmap_copy (old, components); basic_block bb; FOR_EACH_BB_FN (bb, cfun) @@ -1496,6 +1499,13 @@ disqualify_problematic_components (sbitmap components) } } } + + /* If the target needs to know that we removed some components, + tell it. */ + bitmap_and_compl (old, old, components); + if (targetm.shrink_wrap.cleanup_components + && !bitmap_empty_p (old)) +targetm.shrink_wrap.cleanup_components (components, old); } /* Place code for prologues and epilogues for COMPONENTS where we can put diff --git a/gcc/target.def b/gcc/target.def index fdad7bbc93e2..ac26e8ed38d7 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -6872,6 +6872,16 @@ epilogue instead.", void, (sbitmap components, edge e, sbitmap edge_components, bool is_prologue), NULL) +DEFHOOK +(cleanup_components, + "This hook is called after the shrink wrapping infrastructure disqualified\n\ +components for various reasons (e.g. because an unsplittable ed
[gcc(refs/users/matz/heads/x86-ssw)] Revert "Add target hook shrink_wrap.cleanup_components"
https://gcc.gnu.org/g:3b04b651551abc541c6ec21835d2e85a407bb1c4 commit 3b04b651551abc541c6ec21835d2e85a407bb1c4 Author: Michael Matz Date: Thu Jul 11 15:16:57 2024 +0200 Revert "Add target hook shrink_wrap.cleanup_components" This reverts commit 826dd85cb9f368608a9890046cd701f7530d7315. I found a better way to solve the problem. Diff: --- gcc/config/i386/i386.cc | 17 - gcc/doc/tm.texi | 8 gcc/doc/tm.texi.in | 2 -- gcc/shrink-wrap.cc | 10 -- gcc/target.def | 10 -- 5 files changed, 47 deletions(-) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 36202b7dcb07..8c9505d53a75 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -10927,21 +10927,6 @@ ix86_disqualify_components (sbitmap components, edge e, sbitmap, bool) bitmap_clear_bit (components, SW_FRAME); } -/* Implements TARGET_SHRINK_WRAP_CLEANUP_COMPONENTS. The infrastructure - has removed some components (noted in REMOVED), this cleans out any - further components that can't be shrink wrapped separately - anymore. */ - -static void -ix86_cleanup_components (sbitmap components, sbitmap removed) -{ - /* If separate shrink wrapping removed any register components - then we must also removed SW_FRAME. */ - bitmap_clear_bit (removed, SW_FRAME); - if (!bitmap_empty_p (removed)) -bitmap_clear_bit (components, SW_FRAME); -} - /* Helper for frame allocation. This resets cfun->machine->fs to reflect the state at the first instruction before prologue (i.e. the call just happened). */ @@ -11122,8 +11107,6 @@ ix86_set_handled_components (sbitmap) #define TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB ix86_components_for_bb #undef TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS #define TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS ix86_disqualify_components -#undef TARGET_SHRINK_WRAP_CLEANUP_COMPONENTS -#define TARGET_SHRINK_WRAP_CLEANUP_COMPONENTS ix86_cleanup_components #undef TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS #define TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS ix86_emit_prologue_components #undef TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 201c8b9f94da..c8b8b126b242 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -5352,14 +5352,6 @@ components in @var{edge_components} that the target cannot handle on edge epilogue instead. @end deftypefn -@deftypefn {Target Hook} void TARGET_SHRINK_WRAP_CLEANUP_COMPONENTS (sbitmap @var{components}, sbitmap @var{removed}) -This hook is called after the shrink wrapping infrastructure disqualified -components for various reasons (e.g. because an unsplittable edge would -have to be split). If there are interdependencies between components the -target can remove those from @var{components} whose dependencies are in -@var{removed}. If this hook would do nothing it doesn't need to be defined. -@end deftypefn - @deftypefn {Target Hook} void TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS (sbitmap) Emit prologue insns for the components indicated by the parameter. @end deftypefn diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index f23e6ff3e455..658e1e63371e 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -3787,8 +3787,6 @@ generic code. @hook TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS -@hook TARGET_SHRINK_WRAP_CLEANUP_COMPONENTS - @hook TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS @hook TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS diff --git a/gcc/shrink-wrap.cc b/gcc/shrink-wrap.cc index db5c1f24d11c..2bec492c2a57 100644 --- a/gcc/shrink-wrap.cc +++ b/gcc/shrink-wrap.cc @@ -1432,9 +1432,6 @@ disqualify_problematic_components (sbitmap components) { auto_sbitmap pro (SBITMAP_SIZE (components)); auto_sbitmap epi (SBITMAP_SIZE (components)); - auto_sbitmap old (SBITMAP_SIZE (components)); - - bitmap_copy (old, components); basic_block bb; FOR_EACH_BB_FN (bb, cfun) @@ -1499,13 +1496,6 @@ disqualify_problematic_components (sbitmap components) } } } - - /* If the target needs to know that we removed some components, - tell it. */ - bitmap_and_compl (old, old, components); - if (targetm.shrink_wrap.cleanup_components - && !bitmap_empty_p (old)) -targetm.shrink_wrap.cleanup_components (components, old); } /* Place code for prologues and epilogues for COMPONENTS where we can put diff --git a/gcc/target.def b/gcc/target.def index ac26e8ed38d7..fdad7bbc93e2 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -6872,16 +6872,6 @@ epilogue instead.", void, (sbitmap components, edge e, sbitmap edge_components, bool is_prologue), NULL) -DEFHOOK -(cleanup_components, - "This hook is called after the shrink wrapping infrastructure disqualified\n\ -components for various reasons (e.g. because an unsplittable edge would\n\ -have to be split). If there are interdependencies between components the\n\ -target can remove those from @v
[gcc(refs/users/matz/heads/x86-ssw)] x86-ssw: Deal with deallocated frame in epilogue
https://gcc.gnu.org/g:fbf3ff6bc169639a2d55ab4ed5f962201ad6416e commit fbf3ff6bc169639a2d55ab4ed5f962201ad6416e Author: Michael Matz Date: Thu Jul 11 15:21:05 2024 +0200 x86-ssw: Deal with deallocated frame in epilogue When the frame is deallocated separately we need to adjust frame_state.sp_offset to be correct before emitting the rest of the standard epilogue. Diff: --- gcc/config/i386/i386.cc | 5 + 1 file changed, 5 insertions(+) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 8c9505d53a75..847c6116884b 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -9931,6 +9931,11 @@ ix86_expand_epilogue (int style) else restore_regs_via_mov = false; + /* If we've (de)allocated the frame separately, then that's done already, + and SP is in fact at a word offset. */ + if (m->frame_alloc_separately) +m->fs.sp_offset = UNITS_PER_WORD; + if (restore_regs_via_mov || frame.nsseregs) { /* Ensure that the entire register save area is addressable via
[gcc/matz/heads/x86-ssw] x86: implement separate shrink wrapping
The branch 'matz/heads/x86-ssw' was updated to point to: 298b1dd7fb81... x86: implement separate shrink wrapping It previously pointed to: fbf3ff6bc169... x86-ssw: Deal with deallocated frame in epilogue Diff: !!! WARNING: THE FOLLOWING COMMITS ARE NO LONGER ACCESSIBLE (LOST): --- fbf3ff6... x86-ssw: Deal with deallocated frame in epilogue 3b04b65... Revert "Add target hook shrink_wrap.cleanup_components" 826dd85... Add target hook shrink_wrap.cleanup_components 4e6291b... x86-ssw: tidy and commentary 495a687... x86-ssw: Adjust testcase d213bc5... x86-ssw: precise using of moves cf6d794... x86-ssw: adjust testcase c5a72cc... x86-ssw: fix testcases f917195... x86-ssw: disable if DRAP reg is needed 5a9a70a... x86-ssw: don't clobber flags eb94eb7... x86: implement separate shrink wrapping Summary of changes (added commits): --- 298b1dd... x86: implement separate shrink wrapping
[gcc(refs/users/matz/heads/x86-ssw)] x86: implement separate shrink wrapping
https://gcc.gnu.org/g:298b1dd7fb8189eb22ae604973083ae80b135ae7 commit 298b1dd7fb8189eb22ae604973083ae80b135ae7 Author: Michael Matz Date: Sun Jun 30 03:52:39 2024 +0200 x86: implement separate shrink wrapping this adds support for the infrastructure for shrink wrapping separate components to the x86 target. The components we track are individual registers to save/restore and the frame allocation itself. There are various limitations where we give up: * when the frame becomes too large * when any complicated realignment is needed (DRAP or not) * when the calling convention requires certain forms of pro- or epilogues (e.g. SEH on win64) * when the function is "special" (uses eh_return and the like); most of that is already avoided by the generic infrastructure in shrink-wrap.cc * when we must not use moves to save/restore registers for any reasons (stack checking being one notable one) and so on. For the last point we now differ between not being able to use moves (then we disable separate shrink wrapping) and merely not wanting to use moves (e.g. because push/pop is equally fast). In the latter case we don't disable separate shrink wrapping, but do use moves for those functions where it does something. Apart from that it's fairly straight forward: for components selected by the infrastructure to be separately shrink-wrapped emit code to save/restore them in the appropriate hook (for the frame-alloc component to adjust the stack pointer), remember them, and don't emit any code for those in the normal expand_prologue and expand_epilogue expanders. But as the x86 prologue and epilogue generators are quite a twisty maze with many cases to deal with this also adds some aborts and asserts for things that are unexpected. The static instruction count of functions can increase (when separate shrink wrapping emits some component sequences into multiple block) and the instructions itself can become larger (moves vs. push/pop), so there's a code size increase for functions where this does something. The dynamic insn count decreases for at least one path through the function (and doesn't increase for others). Two testcases need separate shrink wrapping disabled because they check for specific generated assembly instruction counts and sequences or specific messages in the pro_and_epilogue dump file, which turn out different with separate shrink wrapping. gcc/ * config/i386/i386.h (struct i86_frame.cannot_use_moves): Add member. (struct machine_function.ssw_min_reg, ssw_max_reg, reg_wrapped_separately, frame_alloc_separately, anything_separately): Add members. * config/i386/i386.cc (ix86_compute_frame_layout): Split out cannot_use_moves from save_regs_using_move computation. (ix_86_emit_save_regs): Ensure not using this under separate shrink wrapping. (ix86_emit_save_regs_using_mov, ix86_emit_save_sse_regs_using_mov, ix86_emit_restore_reg_using_pop, ix86_emit_restore_reg_using_pop2, ix86_emit_restore_regs_using_pop): Don't handle separately shrink wrapped components. (ix86_expand_prologue): Handle separate shrink wrapping. (ix86_emit_restore_reg_using_mov): New function, split out from ... (ix86_emit_restore_regs_using_mov): ... here and ... (ix86_emit_restore_sse_regs_using_mov): ... here. (ix86_expand_epilogue): Handle separate shrink wrapping. (NCOMPONENTS, SW_FRAME): Add new defines. (separate_frame_alloc_p, ix86_get_separate_components, ix86_components_for_bb, ix86_disqualify_components, ix86_init_frame_state, ix86_alloc_frame, ix86_dealloc_frame, ix86_process_reg_components, ix86_emit_prologue_components, ix86_emit_epilogue_components, ix86_set_handled_components): Add new functions. (TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS, TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB, TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS, TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS, TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS, TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Define target hook macros. gcc/testsuite * gcc.dg/stack-check-5.c: Disable separate shrink wrapping. * gcc.target/x86_64/abi/callabi/leaf-2.c: Ditto. Diff: --- gcc/config/i386/i386.cc| 491 ++--- gcc/config/i386/i386.h | 5 + gcc/testsuite/gcc.dg/stack-check-5.c | 2 +- .../gcc.target/x86_64/abi/callabi/leaf-2.c | 2 +- 4 files changed, 447 insertions(+)
[gcc/matz/heads/x86-ssw] x86: Implement separate shrink wrapping
The branch 'matz/heads/x86-ssw' was updated to point to: f0d9a4c9d44c... x86: Implement separate shrink wrapping It previously pointed to: 298b1dd7fb81... x86: implement separate shrink wrapping Diff: !!! WARNING: THE FOLLOWING COMMITS ARE NO LONGER ACCESSIBLE (LOST): --- 298b1dd... x86: implement separate shrink wrapping Summary of changes (added commits): --- f0d9a4c... x86: Implement separate shrink wrapping
[gcc(refs/users/matz/heads/x86-ssw)] x86: Implement separate shrink wrapping
https://gcc.gnu.org/g:f0d9a4c9d44c463f86699d7f054722d5d0a20d09 commit f0d9a4c9d44c463f86699d7f054722d5d0a20d09 Author: Michael Matz Date: Sun Jun 30 03:52:39 2024 +0200 x86: Implement separate shrink wrapping this adds support for the infrastructure for shrink wrapping separate components to the x86 target. The components we track are individual registers to save/restore and the frame allocation itself. There are various limitations where we give up: * when the frame becomes too large * when any complicated realignment is needed (DRAP or not) * when the calling convention requires certain forms of pro- or epilogues (e.g. SEH on win64) * when the function is "special" (uses eh_return and the like); most of that is already avoided by the generic infrastructure in shrink-wrap.cc * when we must not use moves to save/restore registers for any reasons (stack checking being one notable one) and so on. For the last point we now differ between not being able to use moves (then we disable separate shrink wrapping) and merely not wanting to use moves (e.g. because push/pop is equally fast). In the latter case we don't disable separate shrink wrapping, but do use moves for those functions where it does something. Apart from that it's fairly straight forward: for components selected by the infrastructure to be separately shrink-wrapped emit code to save/restore them in the appropriate hook (for the frame-alloc component to adjust the stack pointer), remember them, and don't emit any code for those in the normal expand_prologue and expand_epilogue expanders. But as the x86 prologue and epilogue generators are quite a twisty maze with many cases to deal with this also adds some aborts and asserts for things that are unexpected. The static instruction count of functions can increase (when separate shrink wrapping emits some component sequences into multiple block) and the instructions itself can become larger (moves vs. push/pop), so there's a code size increase for functions where this does something. The dynamic insn count decreases for at least one path through the function (and doesn't increase for others). Two testcases need separate shrink wrapping disabled because they check for specific generated assembly instruction counts and sequences or specific messages in the pro_and_epilogue dump file, which turn out different with separate shrink wrapping. gcc/ * config/i386/i386.h (struct i86_frame.cannot_use_moves): Add member. (struct machine_function.ssw_min_reg, ssw_max_reg, reg_wrapped_separately, frame_alloc_separately, anything_separately): Add members. * config/i386/i386.cc (ix86_compute_frame_layout): Split out cannot_use_moves from save_regs_using_move computation. (ix_86_emit_save_regs): Ensure not using this under separate shrink wrapping. (ix86_emit_save_regs_using_mov, ix86_emit_save_sse_regs_using_mov, ix86_emit_restore_reg_using_pop, ix86_emit_restore_reg_using_pop2, ix86_emit_restore_regs_using_pop): Don't handle separately shrink wrapped components. (ix86_expand_prologue): Handle separate shrink wrapping. (ix86_emit_restore_reg_using_mov): New function, split out from ... (ix86_emit_restore_regs_using_mov): ... here and ... (ix86_emit_restore_sse_regs_using_mov): ... here. (ix86_expand_epilogue): Handle separate shrink wrapping. (NCOMPONENTS, SW_FRAME): Add new defines. (separate_frame_alloc_p, ix86_get_separate_components, ix86_components_for_bb, ix86_disqualify_components, ix86_init_frame_state, ix86_alloc_frame, ix86_dealloc_frame, ix86_process_reg_components, ix86_emit_prologue_components, ix86_emit_epilogue_components, ix86_set_handled_components): Add new functions. (TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS, TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB, TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS, TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS, TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS, TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Define target hook macros. gcc/testsuite/ * gcc.dg/stack-check-5.c: Disable separate shrink wrapping. * gcc.target/x86_64/abi/callabi/leaf-2.c: Ditto. Diff: --- gcc/config/i386/i386.cc| 491 ++--- gcc/config/i386/i386.h | 5 + gcc/testsuite/gcc.dg/stack-check-5.c | 2 +- .../gcc.target/x86_64/abi/callabi/leaf-2.c | 2 +- 4 files changed, 447 insertions(+
[gcc(refs/users/matz/heads/x86-ssw)] x86: Implement separate shrink wrapping
https://gcc.gnu.org/g:86676836d6cb8289c53ff3dffcf8583505a7e0f5 commit 86676836d6cb8289c53ff3dffcf8583505a7e0f5 Author: Michael Matz Date: Sun Jun 30 03:52:39 2024 +0200 x86: Implement separate shrink wrapping this adds support for the infrastructure for shrink wrapping separate components to the x86 target. The components we track are individual registers to save/restore and the frame allocation itself. There are various limitations where we give up: * when the frame becomes too large * when any complicated realignment is needed (DRAP or not) * when the calling convention requires certain forms of pro- or epilogues (e.g. SEH on win64) * when the function is "special" (uses eh_return and the like); most of that is already avoided by the generic infrastructure in shrink-wrap.cc * when we must not use moves to save/restore registers for any reasons (stack checking being one notable one) and so on. For the last point we now differ between not being able to use moves (then we disable separate shrink wrapping) and merely not wanting to use moves (e.g. because push/pop is equally fast). In the latter case we don't disable separate shrink wrapping, but do use moves for those functions where it does something. Apart from that it's fairly straight forward: for components selected by the infrastructure to be separately shrink-wrapped emit code to save/restore them in the appropriate hook (for the frame-alloc component to adjust the stack pointer), remember them, and don't emit any code for those in the normal expand_prologue and expand_epilogue expanders. But as the x86 prologue and epilogue generators are quite a twisty maze with many cases to deal with this also adds some aborts and asserts for things that are unexpected. The static instruction count of functions can increase (when separate shrink wrapping emits some component sequences into multiple block) and the instructions itself can become larger (moves vs. push/pop), so there's a code size increase for functions where this does something. The dynamic insn count decreases for at least one path through the function (and doesn't increase for others). Two testcases need separate shrink wrapping disabled because they check for specific generated assembly instruction counts and sequences or specific messages in the pro_and_epilogue dump file, which turn out different with separate shrink wrapping. gcc/ * config/i386/i386.h (struct i86_frame.cannot_use_moves): Add member. (struct machine_function.ssw_min_reg, ssw_max_reg, reg_wrapped_separately, frame_alloc_separately, anything_separately): Add members. * config/i386/i386.cc (ix86_compute_frame_layout): Split out cannot_use_moves from save_regs_using_move computation. (ix_86_emit_save_regs): Ensure not using this under separate shrink wrapping. (ix86_emit_save_regs_using_mov, ix86_emit_save_sse_regs_using_mov, ix86_emit_restore_reg_using_pop, ix86_emit_restore_reg_using_pop2, ix86_emit_restore_regs_using_pop): Don't handle separately shrink wrapped components. (ix86_expand_prologue): Handle separate shrink wrapping. (ix86_emit_restore_reg_using_mov): New function, split out from ... (ix86_emit_restore_regs_using_mov): ... here and ... (ix86_emit_restore_sse_regs_using_mov): ... here. (ix86_expand_epilogue): Handle separate shrink wrapping. (NCOMPONENTS, SW_FRAME): Add new defines. (separate_frame_alloc_p, ix86_get_separate_components, ix86_components_for_bb, ix86_disqualify_components, ix86_init_frame_state, ix86_alloc_frame, ix86_dealloc_frame, ix86_process_reg_components, ix86_emit_prologue_components, ix86_emit_epilogue_components, ix86_set_handled_components): Add new functions. (TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS, TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB, TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS, TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS, TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS, TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Define target hook macros. gcc/testsuite/ * gcc.dg/stack-check-5.c: Disable separate shrink wrapping. * gcc.target/x86_64/abi/callabi/leaf-2.c: Ditto. Diff: --- gcc/config/i386/i386.cc| 491 ++--- gcc/config/i386/i386.h | 5 + gcc/testsuite/gcc.dg/stack-check-5.c | 2 +- .../gcc.target/x86_64/abi/callabi/leaf-2.c | 2 +- 4 files changed, 447 insertions(+
[gcc r15-4242] Fix PR116650: check all regs in regrename targets
https://gcc.gnu.org/g:85bee4f77b1b0ebe68b3efe0c356b7d5fb006c4d commit r15-4242-g85bee4f77b1b0ebe68b3efe0c356b7d5fb006c4d Author: Michael Matz Date: Thu Oct 10 16:36:51 2024 +0200 Fix PR116650: check all regs in regrename targets (this came up for m68k vs. LRA, but is a generic problem) Regrename wants to use new registers for certain def-use chains. For validity of replacements it needs to check that the selected candidates are unused up to then. That's done in check_new_reg_p. But if it so happens that the new register needs more hardregs than the old register (which happens if the target allows inter-bank moves and the mode is something like a DFmode that needs to be placed into a SImode reg-pair), then check_new_reg_p only checks the first of those registers for free-ness. This is caused by that function looking up the number of necessary hardregs only in terms of the old hardreg number. It of course needs to do that in terms of the new candidate regnumber. The symptom is that regrename sometimes clobbers the higher numbered registers of such a regrename target pair. This patch fixes that problem. (In the particular case of the bug report it was LRA that left over a inter-bank move instruction that triggers regrename, ultimately causing the mis-compile. Reload didn't do that, but in general we of course can't rely on such moves not happening if the target allows them.) This also shows a general confusion in that function and the target hook interface here: for (i = nregs - 1; i >= 0; --) ... || ! HARD_REGNO_RENAME_OK (reg + i, new_reg + i)) it uses nregs in a way that requires it to be the same between old and new register. The problem is that the target hook only gets register numbers, when it instead should get a mode and register numbers and would be called only for the first but not for subsequent registers. I've looked at a number of definitions of that target hook and I think that this is currently harmless in the sense that it would merely rule out some potential reg-renames that would in fact be okay to do. So I'm not changing the target hook interface here and hence that problem remains unfixed. PR rtl-optimization/116650 * regrename.cc (check_new_reg_p): Calculate nregs in terms of the new candidate register. Diff: --- gcc/regrename.cc | 25 +++-- 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/gcc/regrename.cc b/gcc/regrename.cc index 054e601740b1..22668d7bf57d 100644 --- a/gcc/regrename.cc +++ b/gcc/regrename.cc @@ -324,10 +324,27 @@ static bool check_new_reg_p (int reg ATTRIBUTE_UNUSED, int new_reg, class du_head *this_head, HARD_REG_SET this_unavailable) { - int nregs = this_head->nregs; + int nregs = 1; int i; struct du_chain *tmp; + /* See whether new_reg accepts all modes that occur in + definition and uses and record the number of regs it would take. */ + for (tmp = this_head->first; tmp; tmp = tmp->next_use) +{ + int n; + /* Completely ignore DEBUG_INSNs, otherwise we can get +-fcompare-debug failures. */ + if (DEBUG_INSN_P (tmp->insn)) + continue; + + if (!targetm.hard_regno_mode_ok (new_reg, GET_MODE (*tmp->loc))) + return false; + n = hard_regno_nregs (new_reg, GET_MODE (*tmp->loc)); + if (n > nregs) + nregs = n; +} + for (i = nregs - 1; i >= 0; --i) if (TEST_HARD_REG_BIT (this_unavailable, new_reg + i) || fixed_regs[new_reg + i] @@ -348,14 +365,10 @@ check_new_reg_p (int reg ATTRIBUTE_UNUSED, int new_reg, definition and uses. */ for (tmp = this_head->first; tmp; tmp = tmp->next_use) { - /* Completely ignore DEBUG_INSNs, otherwise we can get --fcompare-debug failures. */ if (DEBUG_INSN_P (tmp->insn)) continue; - if (!targetm.hard_regno_mode_ok (new_reg, GET_MODE (*tmp->loc)) - || call_clobbered_in_chain_p (this_head, GET_MODE (*tmp->loc), - new_reg)) + if (call_clobbered_in_chain_p (this_head, GET_MODE (*tmp->loc), new_reg)) return false; }
[gcc r15-8262] doc: regenerate rs6000/rs6000.opt.urls
https://gcc.gnu.org/g:8333f1c7e699419a4e428fa1d66156d7bad69c9f commit r15-8262-g8333f1c7e699419a4e428fa1d66156d7bad69c9f Author: Michael Matz Date: Tue Mar 18 17:21:23 2025 +0100 doc: regenerate rs6000/rs6000.opt.urls which I forgot and the autobuilder complained. * config/rs6000/rs6000.opt.urls: Regenerate. Diff: --- gcc/config/rs6000/rs6000.opt.urls | 3 +++ 1 file changed, 3 insertions(+) diff --git a/gcc/config/rs6000/rs6000.opt.urls b/gcc/config/rs6000/rs6000.opt.urls index c7c1cefe22cd..0b418c09a083 100644 --- a/gcc/config/rs6000/rs6000.opt.urls +++ b/gcc/config/rs6000/rs6000.opt.urls @@ -98,6 +98,9 @@ UrlSuffix(gcc/RS_002f6000-and-PowerPC-Options.html#index-mminimal-toc) mfull-toc UrlSuffix(gcc/RS_002f6000-and-PowerPC-Options.html#index-mfull-toc) +msplit-patch-nops +UrlSuffix(gcc/RS_002f6000-and-PowerPC-Options.html#index-msplit-patch-nops) + mvrsave UrlSuffix(gcc/RS_002f6000-and-PowerPC-Options.html#index-mvrsave)
[gcc r15-8236] rs6000: Add -msplit-patch-nops (PR112980)
https://gcc.gnu.org/g:96698551b8e19fc33637908190f121e039301993 commit r15-8236-g96698551b8e19fc33637908190f121e039301993 Author: Michael Matz Date: Wed Nov 13 16:04:06 2024 +0100 rs6000: Add -msplit-patch-nops (PR112980) as the bug report details some uses of -fpatchable-function-entry aren't happy with the "before" NOPs being inserted between global and local entry point on powerpc. We want the before NOPs be in front of the global entry point. That means that the patching NOPs aren't consecutive for dual entry point functions, but for these usecases that's not the problem. But let us support both under the control of a new target option: -msplit-patch-nops. gcc/ PR target/112980 * config/rs6000/rs6000.opt (msplit-patch-nops): New option. * doc/invoke.texi (RS/6000 and PowerPC Options): Document it. * config/rs6000/rs6000.h (machine_function.stop_patch_area_print): New member. * config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry): Emit split nops under control of that one. * config/rs6000/rs6000-logue.cc (rs6000_output_function_prologue): Add handling of split patch nops. Diff: --- gcc/config/rs6000/rs6000-logue.cc | 15 +-- gcc/config/rs6000/rs6000.cc | 27 +++ gcc/config/rs6000/rs6000.h| 6 ++ gcc/config/rs6000/rs6000.opt | 4 gcc/doc/invoke.texi | 17 +++-- 5 files changed, 57 insertions(+), 12 deletions(-) diff --git a/gcc/config/rs6000/rs6000-logue.cc b/gcc/config/rs6000/rs6000-logue.cc index aa07d79d9742..52f44b114b06 100644 --- a/gcc/config/rs6000/rs6000-logue.cc +++ b/gcc/config/rs6000/rs6000-logue.cc @@ -4005,8 +4005,8 @@ rs6000_output_function_prologue (FILE *file) unsigned short patch_area_size = crtl->patch_area_size; unsigned short patch_area_entry = crtl->patch_area_entry; - /* Need to emit the patching area. */ - if (patch_area_size > 0) + /* Emit non-split patching area now. */ + if (!TARGET_SPLIT_PATCH_NOPS && patch_area_size > 0) { cfun->machine->global_entry_emitted = true; /* As ELFv2 ABI shows, the allowable bytes between the global @@ -4027,7 +4027,6 @@ rs6000_output_function_prologue (FILE *file) patch_area_entry); rs6000_print_patchable_function_entry (file, patch_area_entry, true); - patch_area_size -= patch_area_entry; } } @@ -4037,9 +4036,13 @@ rs6000_output_function_prologue (FILE *file) assemble_name (file, name); fputs ("\n", file); /* Emit the nops after local entry. */ - if (patch_area_size > 0) - rs6000_print_patchable_function_entry (file, patch_area_size, - patch_area_entry == 0); + if (patch_area_size > patch_area_entry) + { + patch_area_size -= patch_area_entry; + cfun->machine->stop_patch_area_print = false; + rs6000_print_patchable_function_entry (file, patch_area_size, +patch_area_entry == 0); + } } else if (rs6000_pcrel_p ()) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 675b039c2b65..737c3d6f7c75 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -15245,11 +15245,25 @@ rs6000_print_patchable_function_entry (FILE *file, { bool global_entry_needed_p = rs6000_global_entry_point_prologue_needed_p (); /* For a function which needs global entry point, we will emit the - patchable area before and after local entry point under the control of - cfun->machine->global_entry_emitted, see the handling in function - rs6000_output_function_prologue. */ - if (!global_entry_needed_p || cfun->machine->global_entry_emitted) + patchable area when it isn't split before and after local entry point + under the control of cfun->machine->global_entry_emitted, see the + handling in function rs6000_output_function_prologue. */ + if (!TARGET_SPLIT_PATCH_NOPS + && (!global_entry_needed_p || cfun->machine->global_entry_emitted)) default_print_patchable_function_entry (file, patch_area_size, record_p); + + /* For split patch nops we emit the before nops (from generic code) + in front of the global entry point and after the local entry point, + under the control of cfun->machine->stop_patch_area_print, see + rs6000_output_function_prologue and rs6000_elf_declare_function_name. */ + if (TARGET_SPLIT_PATCH_NOPS) +{ + if (!cfun->machine->stop_patch_area_print) + default_print_patchable_function_entry (file, patch_area_size, + record_p); + else + gcc_assert (global_entry_need