[gcc r15-3219] final: go down ASHIFT in walk_alter_subreg

2024-08-27 Thread Michael Matz via Gcc-cvs
https://gcc.gnu.org/g:359209bdc7245f8768b5044acded8509545e4990

commit r15-3219-g359209bdc7245f8768b5044acded8509545e4990
Author: Michael Matz 
Date:   Thu Aug 22 17:03:56 2024 +0200

final: go down ASHIFT in walk_alter_subreg

when experimenting with m68k plus LRA one of the
changes in the backend is to accept ASHIFTs (not only
MULT) as scale code for address indices.  When then not
turning on LRA but using reload those addresses are
presented to it which chokes on them.  While reload is
going away the change to make them work doesn't really hurt
(and generally seems useful, as MULT and ASHIFT really are
no different).  So just add it.

PR target/116413
* final.cc (walk_alter_subreg): Recurse on AHIFT.

Diff:
---
 gcc/final.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/final.cc b/gcc/final.cc
index eb9e065d9f0a..5d911586de5b 100644
--- a/gcc/final.cc
+++ b/gcc/final.cc
@@ -3146,6 +3146,7 @@ walk_alter_subreg (rtx *xp, bool *changed)
 case PLUS:
 case MULT:
 case AND:
+case ASHIFT:
   XEXP (x, 0) = walk_alter_subreg (&XEXP (x, 0), changed);
   XEXP (x, 1) = walk_alter_subreg (&XEXP (x, 1), changed);
   break;


[gcc r15-3221] LRA: Fix setup_sp_offset

2024-08-27 Thread Michael Matz via Gcc-cvs
https://gcc.gnu.org/g:e223ac9c225352e3aeea7180a3b56a96ecdbe2fd

commit r15-3221-ge223ac9c225352e3aeea7180a3b56a96ecdbe2fd
Author: Michael Matz 
Date:   Thu Aug 22 17:21:42 2024 +0200

LRA: Fix setup_sp_offset

This is part of making m68k work with LRA.  See PR116429.
In short: setup_sp_offset is internally inconsistent.  It wants to
setup the sp_offset for newly generated instructions.  sp_offset for
an instruction is always the state of the sp-offset right before that
instruction.  For that it starts at the (assumed correct) sp_offset
of the instruction right after the given (new) sequence, and then
iterates that sequence forward simulating its effects on sp_offset.

That can't ever be right: either it needs to start at the front
and simulate forward, or start at the end and simulate backward.
The former seems to be the more natural way.  Funnily the local
variable holding that instruction is also called 'before'.

This changes it to the first variant: start before the sequence,
do one simulation step to get the sp-offset state in front of the
sequence and then continue simulating.

More details: in the problematic testcase we start with this
situation (sp_off before 550 is 0):

  550: [--sp] = 0 sp_off = 0  {pushexthisi_const}
  551: [--sp] = 37sp_off = -4 {pushexthisi_const}
  552: [--sp] = r37   sp_off = -8 {movsi_m68k2}
  554: [--sp] = r116 - r37sp_off = -12 {subsi3}
  556: call   sp_off = -16

insn 554 doesn't match its constraints and needs some reloads:

  Creating newreg=262, assigning class DATA_REGS to r262
  554: r262:SI=r262:SI-r37:SI
  REG_ARGS_SIZE 0x10
Inserting insn reload before:
  996: r262:SI=r116:SI
Inserting insn reload after:
  997: [--%sp:SI]=r262:SI

 Considering alt=0 of insn 997:   (0) =g  (1) damSKT
1 Non pseudo reload: reject++
  overall=1,losers=0,rld_nregs=0
  Choosing alt 0 in insn 997:  (0) =g  (1) damSKT {*movsi_m68k2} 
(sp_off=-16)

Note how insn 997 (the after-reload) now has sp_off=-16 already.  It all
goes downhill from there.  We end up with these insns:

  552: [--sp] = r37   sp_off = -8 {movsi_m68k2}
  996: r262 = r116sp_off = -12
  554: r262 = r262 - r37  sp_off = -12
  997: [--sp] = r262  sp_off = -16  (!!! should be -12)
  556: call   sp_off = -16

The call insn sp_off remains at the correct -16, but internally it's already
inconsistent here.  If the sp_off before an insn is -16, and that insn
pre_decs sp, then the after-insn sp_off should be -20.

PR target/116429
* lra.cc (setup_sp_offset): Start with sp_offset from
before the new sequence, not from after.

Diff:
---
 gcc/lra.cc | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/gcc/lra.cc b/gcc/lra.cc
index fb32e134004a..b84384b21454 100644
--- a/gcc/lra.cc
+++ b/gcc/lra.cc
@@ -1863,14 +1863,17 @@ push_insns (rtx_insn *from, rtx_insn *to)
 }
 
 /* Set up and return sp offset for insns in range [FROM, LAST].  The offset is
-   taken from the next BB insn after LAST or zero if there in such
-   insn.  */
+   taken from the BB insn before FROM after simulating its effects,
+   or zero if there is no such insn.  */
 static poly_int64
 setup_sp_offset (rtx_insn *from, rtx_insn *last)
 {
-  rtx_insn *before = next_nonnote_nondebug_insn_bb (last);
-  poly_int64 offset = (before == NULL_RTX || ! INSN_P (before)
-  ? 0 : lra_get_insn_recog_data (before)->sp_offset);
+  rtx_insn *before = prev_nonnote_nondebug_insn_bb (from);
+  poly_int64 offset = 0;
+
+  if (before && INSN_P (before))
+offset = lra_update_sp_offset (PATTERN (before),
+  lra_get_insn_recog_data (before)->sp_offset);
 
   for (rtx_insn *insn = from; insn != NEXT_INSN (last); insn = NEXT_INSN 
(insn))
 {


[gcc r15-3220] LRA: Don't use 0 as initialization for sp_offset

2024-08-27 Thread Michael Matz via Gcc-cvs
https://gcc.gnu.org/g:542773888190ef67dca194f4861abab104fa9b5b

commit r15-3220-g542773888190ef67dca194f4861abab104fa9b5b
Author: Michael Matz 
Date:   Thu Aug 22 17:09:11 2024 +0200

LRA: Don't use 0 as initialization for sp_offset

this is part of making m68k work with LRA.  See PR116374.
m68k has the property that sometimes the elimation offset
between %sp and %argptr is zero.  During setting up elimination
infrastructure it's changes between sp_offset and previous_offset
that feed into insns_with_changed_offsets that ultimately will
setup looking at the instructions so marked.

But the initial values for sp_offset and previous_offset are
also zero.  So if the targets INITIAL_ELIMINATION_OFFSET (called
in update_reg_eliminate) is zero then nothing changes, the
instructions in question don't get into the list to consider and
the sp_offset tracking goes wrong.

Solve this by initializing those member with -1 instead of zero.
An initial offset of that value seems very unlikely, as it's
in word-sized increments.  This then also reveals a problem in
eliminate_regs_in_insn where it always uses sp_offset-previous_offset
as offset adjustment, even in the first_p pass.  That was harmless
when previous_offset was uninitialized as zero.  But all the other
code uses a different idiom of checking for first_p (or rather
update_p which is !replace_p&&!first_p), and using sp_offset directly.
So use that as well in eliminate_regs_in_insn.

PR target/116374
* lra-eliminations.cc (init_elim_table): Use -1 as initializer.
(update_reg_eliminate): Accept -1 as not-yet-used marker.
(eliminate_regs_in_insn): Use previous_sp_offset only when
not first_p.

Diff:
---
 gcc/lra-eliminations.cc | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index 5bed259cffeb..96772f2904a6 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -969,7 +969,8 @@ eliminate_regs_in_insn (rtx_insn *insn, bool replace_p, 
bool first_p,
  if (! replace_p)
{
  if (known_eq (update_sp_offset, 0))
-   offset += (ep->offset - ep->previous_offset);
+   offset += (!first_p
+  ? ep->offset - ep->previous_offset : ep->offset);
  if (ep->to_rtx == stack_pointer_rtx)
{
  if (first_p)
@@ -1212,7 +1213,7 @@ update_reg_eliminate (bitmap insns_with_changed_offsets)
  if (lra_dump_file != NULL)
fprintf (lra_dump_file, "Using elimination %d to %d now\n",
 ep1->from, ep1->to);
- lra_assert (known_eq (ep1->previous_offset, 0));
+ lra_assert (known_eq (ep1->previous_offset, -1));
  ep1->previous_offset = ep->offset;
}
  else
@@ -1283,7 +1284,7 @@ init_elim_table (void)
   for (ep = reg_eliminate, ep1 = reg_eliminate_1;
ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++, ep1++)
 {
-  ep->offset = ep->previous_offset = 0;
+  ep->offset = ep->previous_offset = -1;
   ep->from = ep1->from;
   ep->to = ep1->to;
   value_p = (targetm.can_eliminate (ep->from, ep->to)


[gcc] Created branch 'matz/heads/x86-ssw' in namespace 'refs/users'

2024-07-09 Thread Michael Matz via Gcc-cvs
The branch 'matz/heads/x86-ssw' was created in namespace 'refs/users' pointing 
to:

 c27b30552e6c... gomp: testsuite: improve compatibility of bad-array-section


[gcc(refs/users/matz/heads/x86-ssw)] x86: implement separate shrink wrapping

2024-07-09 Thread Michael Matz via Gcc-cvs
https://gcc.gnu.org/g:eb94eb73cf3993c1d544e6eb8c4dcb671f215b25

commit eb94eb73cf3993c1d544e6eb8c4dcb671f215b25
Author: Michael Matz 
Date:   Sun Jun 30 03:52:39 2024 +0200

x86: implement separate shrink wrapping

Diff:
---
 gcc/config/i386/i386.cc | 581 +++-
 gcc/config/i386/i386.h  |   2 +
 2 files changed, 533 insertions(+), 50 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 4b6b665e5997..33e69e96008d 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -6970,7 +6970,7 @@ ix86_compute_frame_layout (void)
 }
 
   frame->save_regs_using_mov
-= TARGET_PROLOGUE_USING_MOVE && m->use_fast_prologue_epilogue;
+= (TARGET_PROLOGUE_USING_MOVE || flag_shrink_wrap_separate) && 
m->use_fast_prologue_epilogue;
 
   /* Skip return address and error code in exception handler.  */
   offset = INCOMING_FRAME_SP_OFFSET;
@@ -7120,7 +7120,8 @@ ix86_compute_frame_layout (void)
   /* Size prologue needs to allocate.  */
   to_allocate = offset - frame->sse_reg_save_offset;
 
-  if ((!to_allocate && frame->nregs <= 1)
+  if ((!to_allocate && frame->nregs <= 1
+   && !flag_shrink_wrap_separate)
   || (TARGET_64BIT && to_allocate >= HOST_WIDE_INT_C (0x8000))
/* If static stack checking is enabled and done with probes,
  the registers need to be saved before allocating the frame.  */
@@ -7417,6 +7418,8 @@ ix86_emit_save_regs (void)
   int regno;
   rtx_insn *insn;
 
+  gcc_assert (!crtl->shrink_wrapped_separate);
+
   if (!TARGET_APX_PUSH2POP2
   || !ix86_can_use_push2pop2 ()
   || cfun->machine->func_type != TYPE_NORMAL)
@@ -7589,7 +7592,8 @@ ix86_emit_save_regs_using_mov (HOST_WIDE_INT cfa_offset)
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
 if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true, true))
   {
-ix86_emit_save_reg_using_mov (word_mode, regno, cfa_offset);
+   if (!cfun->machine->reg_wrapped_separately[regno])
+ ix86_emit_save_reg_using_mov (word_mode, regno, cfa_offset);
cfa_offset -= UNITS_PER_WORD;
   }
 }
@@ -7604,7 +7608,8 @@ ix86_emit_save_sse_regs_using_mov (HOST_WIDE_INT 
cfa_offset)
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
 if (SSE_REGNO_P (regno) && ix86_save_reg (regno, true, true))
   {
-   ix86_emit_save_reg_using_mov (V4SFmode, regno, cfa_offset);
+   if (!cfun->machine->reg_wrapped_separately[regno])
+ ix86_emit_save_reg_using_mov (V4SFmode, regno, cfa_offset);
cfa_offset -= GET_MODE_SIZE (V4SFmode);
   }
 }
@@ -9089,6 +9094,7 @@ ix86_expand_prologue (void)
= frame.sse_reg_save_offset - frame.reg_save_offset;
 
   gcc_assert (int_registers_saved);
+  gcc_assert (!m->frame_alloc_separately);
 
   /* No need to do stack checking as the area will be immediately
 written.  */
@@ -9106,6 +9112,7 @@ ix86_expand_prologue (void)
   && flag_stack_clash_protection
   && !ix86_target_stack_probe ())
 {
+  gcc_assert (!m->frame_alloc_separately);
   ix86_adjust_stack_and_probe (allocate, int_registers_saved, false);
   allocate = 0;
 }
@@ -9116,6 +9123,7 @@ ix86_expand_prologue (void)
 {
   const HOST_WIDE_INT probe_interval = get_probe_interval ();
 
+  gcc_assert (!m->frame_alloc_separately);
   if (STACK_CHECK_MOVING_SP)
{
  if (crtl->is_leaf
@@ -9172,9 +9180,16 @@ ix86_expand_prologue (void)
   else if (!ix86_target_stack_probe ()
   || frame.stack_pointer_offset < CHECK_STACK_LIMIT)
 {
-  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
-GEN_INT (-allocate), -1,
-m->fs.cfa_reg == stack_pointer_rtx);
+  if (!m->frame_alloc_separately)
+   pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
+  GEN_INT (-allocate), -1,
+  m->fs.cfa_reg == stack_pointer_rtx);
+  else
+   {
+ if (m->fs.cfa_reg == stack_pointer_rtx)
+   m->fs.cfa_offset -= allocate;
+ m->fs.sp_offset += allocate;
+   }
 }
   else
 {
@@ -9184,6 +9199,8 @@ ix86_expand_prologue (void)
   bool eax_live = ix86_eax_live_at_start_p ();
   bool r10_live = false;
 
+  gcc_assert (!m->frame_alloc_separately);
+
   if (TARGET_64BIT)
 r10_live = (DECL_STATIC_CHAIN (current_function_decl) != 0);
 
@@ -9338,6 +9355,7 @@ ix86_emit_restore_reg_using_pop (rtx reg, bool ppx_p)
   struct machine_function *m = cfun->machine;
   rtx_insn *insn = emit_insn (gen_pop (reg, ppx_p));
 
+  gcc_assert (!m->reg_wrapped_separately[REGNO (reg)]);
   ix86_add_cfa_restore_note (insn, reg, m->fs.sp_offset);
   m->fs.sp_offset -= UNITS_PER_WORD;
 
@@ -9396,6 +9414,9 @@ ix86_emit_restore_reg_using_pop2 (rtx reg1, rtx reg2, 
bool ppx_p = false)
   const int offset = UNITS_PER_WORD * 2;
   rtx_insn

[gcc(refs/users/matz/heads/x86-ssw)] x86-ssw: don't clobber flags

2024-07-09 Thread Michael Matz via Gcc-cvs
https://gcc.gnu.org/g:5a9a70a5837aba373e3f36a89943c52e37a19809

commit 5a9a70a5837aba373e3f36a89943c52e37a19809
Author: Michael Matz 
Date:   Tue Jul 9 02:20:10 2024 +0200

x86-ssw: don't clobber flags

Diff:
---
 gcc/config/i386/i386.cc | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 33e69e96008d..734802dbed4f 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -10878,8 +10878,12 @@ ix86_components_for_bb (basic_block bb)
 }
 
 static void
-ix86_disqualify_components (sbitmap, edge, sbitmap, bool)
+ix86_disqualify_components (sbitmap components, edge e, sbitmap, bool)
 {
+  /* If the flags are needed at the start of e->dest then we can't insert
+ our stack adjustment insns (they default to flag-clobbering add/sub).  */
+  if (bitmap_bit_p (DF_LIVE_IN (e->dest), FLAGS_REG))
+bitmap_clear_bit (components, SW_FRAME);
 }
 
 static void


[gcc(refs/users/matz/heads/x86-ssw)] x86-ssw: disable if DRAP reg is needed

2024-07-09 Thread Michael Matz via Gcc-cvs
https://gcc.gnu.org/g:f917195f8a4e1767e89ebb0c875abcbe4dcf97ff

commit f917195f8a4e1767e89ebb0c875abcbe4dcf97ff
Author: Michael Matz 
Date:   Tue Jul 9 02:37:55 2024 +0200

x86-ssw: disable if DRAP reg is needed

Diff:
---
 gcc/config/i386/i386.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 734802dbed4f..4aa37c2ffeaa 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -10805,7 +10805,8 @@ ix86_get_separate_components (void)
   sbitmap components;
 
   ix86_finalize_stack_frame_flags ();
-  if (!frame->save_regs_using_mov)
+  if (!frame->save_regs_using_mov
+  || crtl->drap_reg)
 return NULL;
 
   components = sbitmap_alloc (NCOMPONENTS);


[gcc(refs/users/matz/heads/x86-ssw)] x86-ssw: fix testcases

2024-07-09 Thread Michael Matz via Gcc-cvs
https://gcc.gnu.org/g:c5a72cc80939e42518f4021e0640d29c8b8495a7

commit c5a72cc80939e42518f4021e0640d29c8b8495a7
Author: Michael Matz 
Date:   Tue Jul 9 04:27:46 2024 +0200

x86-ssw: fix testcases

the separate-shrink-wrap infrastructure sometimes
considers components as handled when they aren't in fact
handled (e.g. never calling any emit_prologue_components or
emit_epilogue_components hooks for the component in question).

So track stuff ourselves.

Diff:
---
 gcc/config/i386/i386.cc | 34 --
 gcc/config/i386/i386.h  |  1 +
 2 files changed, 21 insertions(+), 14 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 4aa37c2ffeaa..23226d204a09 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -6970,7 +6970,7 @@ ix86_compute_frame_layout (void)
 }
 
   frame->save_regs_using_mov
-= (TARGET_PROLOGUE_USING_MOVE || flag_shrink_wrap_separate) && 
m->use_fast_prologue_epilogue;
+= (TARGET_PROLOGUE_USING_MOVE /*|| flag_shrink_wrap_separate*/) && 
m->use_fast_prologue_epilogue;
 
   /* Skip return address and error code in exception handler.  */
   offset = INCOMING_FRAME_SP_OFFSET;
@@ -7121,7 +7121,7 @@ ix86_compute_frame_layout (void)
   to_allocate = offset - frame->sse_reg_save_offset;
 
   if ((!to_allocate && frame->nregs <= 1
-   && !flag_shrink_wrap_separate)
+   /*&& !flag_shrink_wrap_separate*/)
   || (TARGET_64BIT && to_allocate >= HOST_WIDE_INT_C (0x8000))
/* If static stack checking is enabled and done with probes,
  the registers need to be saved before allocating the frame.  */
@@ -7418,7 +7418,7 @@ ix86_emit_save_regs (void)
   int regno;
   rtx_insn *insn;
 
-  gcc_assert (!crtl->shrink_wrapped_separate);
+  gcc_assert (!cfun->machine->anything_separately);
 
   if (!TARGET_APX_PUSH2POP2
   || !ix86_can_use_push2pop2 ()
@@ -8974,7 +8974,7 @@ ix86_expand_prologue (void)
   if (!int_registers_saved)
 {
   /* If saving registers via PUSH, do so now.  */
-  if (!frame.save_regs_using_mov)
+  if (!frame.save_regs_using_mov && !m->anything_separately)
{
  ix86_emit_save_regs ();
  int_registers_saved = true;
@@ -9489,7 +9489,7 @@ ix86_emit_restore_regs_using_pop (bool ppx_p)
 {
   unsigned int regno;
 
-  gcc_assert (!crtl->shrink_wrapped_separate);
+  gcc_assert (!cfun->machine->anything_separately);
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
 if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, false, true))
   ix86_emit_restore_reg_using_pop (gen_rtx_REG (word_mode, regno), ppx_p);
@@ -9506,7 +9506,7 @@ ix86_emit_restore_regs_using_pop2 (void)
   int loaded_regnum = 0;
   bool aligned = cfun->machine->fs.sp_offset % 16 == 0;
 
-  gcc_assert (!crtl->shrink_wrapped_separate);
+  gcc_assert (!cfun->machine->anything_separately);
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
 if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, false, true))
   {
@@ -9894,7 +9894,7 @@ ix86_expand_epilogue (int style)
   /* EH_RETURN requires the use of moves to function properly.  */
   if (crtl->calls_eh_return)
 restore_regs_via_mov = true;
-  else if (crtl->shrink_wrapped_separate)
+  else if (m->anything_separately)
 {
   gcc_assert (!TARGET_SEH);
   restore_regs_via_mov = true;
@@ -10800,13 +10800,14 @@ separate_frame_alloc_p (void)
 static sbitmap
 ix86_get_separate_components (void)
 {
-  struct machine_function *m = cfun->machine;
-  struct ix86_frame *frame = &m->frame;
+  //struct machine_function *m = cfun->machine;
+  //struct ix86_frame *frame = &m->frame;
   sbitmap components;
 
   ix86_finalize_stack_frame_flags ();
-  if (!frame->save_regs_using_mov
-  || crtl->drap_reg)
+  if (/*!frame->save_regs_using_mov
+  ||*/ crtl->drap_reg
+  || cfun->machine->func_type != TYPE_NORMAL)
 return NULL;
 
   components = sbitmap_alloc (NCOMPONENTS);
@@ -11150,6 +11151,8 @@ ix86_process_components (sbitmap components, bool 
prologue_p)
   {
if (bitmap_bit_p (components, regno))
  {
+   m->reg_wrapped_separately[regno] = true;
+   m->anything_separately = true;
if (prologue_p)
  ix86_emit_save_reg_using_mov (word_mode, regno, cfa_offset);
else
@@ -11161,6 +11164,8 @@ ix86_process_components (sbitmap components, bool 
prologue_p)
   {
if (bitmap_bit_p (components, regno))
  {
+   m->reg_wrapped_separately[regno] = true;
+   m->anything_separately = true;
if (prologue_p)
  ix86_emit_save_reg_using_mov (V4SFmode, regno, sse_cfa_offset);
else
@@ -11181,6 +11186,7 @@ ix86_emit_prologue_components (sbitmap components)
   if (bitmap_bit_p (components, SW_FRAME))
 {
   cfun->machine->frame_alloc_separately = true;
+  cfun->machine->anything_separately = true;
   ix86_alloc_frame ();
 }
 }
@@ -111

[gcc(refs/users/matz/heads/x86-ssw)] x86-ssw: adjust testcase

2024-07-09 Thread Michael Matz via Gcc-cvs
https://gcc.gnu.org/g:cf6d794219dd0cf2ca3601e2d6e6b9e5f497a47a

commit cf6d794219dd0cf2ca3601e2d6e6b9e5f497a47a
Author: Michael Matz 
Date:   Tue Jul 9 06:01:22 2024 +0200

x86-ssw: adjust testcase

Diff:
---
 gcc/testsuite/gcc.target/x86_64/abi/callabi/leaf-2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/x86_64/abi/callabi/leaf-2.c 
b/gcc/testsuite/gcc.target/x86_64/abi/callabi/leaf-2.c
index 2a54bc89cfc2..140389626659 100644
--- a/gcc/testsuite/gcc.target/x86_64/abi/callabi/leaf-2.c
+++ b/gcc/testsuite/gcc.target/x86_64/abi/callabi/leaf-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mabi=sysv" } */
+/* { dg-options "-O2 -mabi=sysv -fno-shrink-wrap-separate" } */
 
 extern int glb1, gbl2, gbl3;


[gcc(refs/users/matz/heads/x86-ssw)] x86-ssw: precise using of moves

2024-07-09 Thread Michael Matz via Gcc-cvs
https://gcc.gnu.org/g:d213bc5e67d903143608e0a7879c2577c33ca47e

commit d213bc5e67d903143608e0a7879c2577c33ca47e
Author: Michael Matz 
Date:   Tue Jul 9 06:01:47 2024 +0200

x86-ssw: precise using of moves

we need to differ between merely not wanting to use moves
and not being able to.  When the allocated frame is too
large we can't use moves freely and hence need to disable
separate shrink wrapping.  If we don't want to use moves
by default for speed or the like but nothing else prevents
them then this is no reason to disable separate shrink wrapping.

Diff:
---
 gcc/config/i386/i386.cc | 20 +++-
 gcc/config/i386/i386.h  |  1 +
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 23226d204a09..20f4dcd61870 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -7120,9 +7120,7 @@ ix86_compute_frame_layout (void)
   /* Size prologue needs to allocate.  */
   to_allocate = offset - frame->sse_reg_save_offset;
 
-  if ((!to_allocate && frame->nregs <= 1
-   /*&& !flag_shrink_wrap_separate*/)
-  || (TARGET_64BIT && to_allocate >= HOST_WIDE_INT_C (0x8000))
+  if ((TARGET_64BIT && to_allocate >= HOST_WIDE_INT_C (0x8000))
/* If static stack checking is enabled and done with probes,
  the registers need to be saved before allocating the frame.  */
   || flag_stack_check == STATIC_BUILTIN_STACK_CHECK
@@ -7135,6 +7133,12 @@ ix86_compute_frame_layout (void)
   || (flag_stack_clash_protection
  && !ix86_target_stack_probe ()
  && to_allocate > get_probe_interval ()))
+{
+  frame->cannot_use_moves = true;
+}
+
+  if ((!to_allocate && frame->nregs <= 1)
+  || frame->cannot_use_moves)
 frame->save_regs_using_mov = false;
 
   if (ix86_using_red_zone ()
@@ -10800,13 +10804,13 @@ separate_frame_alloc_p (void)
 static sbitmap
 ix86_get_separate_components (void)
 {
-  //struct machine_function *m = cfun->machine;
-  //struct ix86_frame *frame = &m->frame;
+  struct machine_function *m = cfun->machine;
+  struct ix86_frame *frame = &m->frame;
   sbitmap components;
 
   ix86_finalize_stack_frame_flags ();
-  if (/*!frame->save_regs_using_mov
-  ||*/ crtl->drap_reg
+  if (frame->cannot_use_moves
+  || crtl->drap_reg
   || cfun->machine->func_type != TYPE_NORMAL)
 return NULL;
 
@@ -10868,9 +10872,7 @@ ix86_components_for_bb (basic_block bb)
{
  need_frame = true;
  break;
-
}
-
}
 }
   if (need_frame)
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index dd73687a8e2c..bda3d97ab4cf 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2645,6 +2645,7 @@ struct GTY(()) ix86_frame
   /* When save_regs_using_mov is set, emit prologue using
  move instead of push instructions.  */
   bool save_regs_using_mov;
+  bool cannot_use_moves;
 
   /* Assume without checking that:
EXPENSIVE_P = expensive_function_p (EXPENSIVE_COUNT).  */


[gcc(refs/users/matz/heads/x86-ssw)] x86-ssw: Adjust testcase

2024-07-09 Thread Michael Matz via Gcc-cvs
https://gcc.gnu.org/g:495a687dc93a58110076700f48fb57fa79026bef

commit 495a687dc93a58110076700f48fb57fa79026bef
Author: Michael Matz 
Date:   Tue Jul 9 14:26:31 2024 +0200

x86-ssw: Adjust testcase

this testcase tries to (uselessly) shrink wrap frame allocation
in f0(), and then calls the prologue expander twice emitting the
messages looked for with the dejagnu directives more times than
expected.  Just disable separate shrink wrapping here.

Diff:
---
 gcc/testsuite/gcc.dg/stack-check-5.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/stack-check-5.c 
b/gcc/testsuite/gcc.dg/stack-check-5.c
index 0243147939c1..b93dabdaea1d 100644
--- a/gcc/testsuite/gcc.dg/stack-check-5.c
+++ b/gcc/testsuite/gcc.dg/stack-check-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fstack-clash-protection -fdump-rtl-pro_and_epilogue 
-fno-optimize-sibling-calls --param stack-clash-protection-probe-interval=12 
--param stack-clash-protection-guard-size=12" } */
+/* { dg-options "-O2 -fstack-clash-protection -fno-shrink-wrap-separate 
-fdump-rtl-pro_and_epilogue -fno-optimize-sibling-calls --param 
stack-clash-protection-probe-interval=12 --param 
stack-clash-protection-guard-size=12" } */
 /* { dg-require-effective-target supports_stack_clash_protection } */
 /* { dg-skip-if "" { *-*-* } { "-fstack-protector*" } { "" } } */


[gcc(refs/users/matz/heads/x86-ssw)] x86-ssw: tidy and commentary

2024-07-10 Thread Michael Matz via Gcc-cvs
https://gcc.gnu.org/g:4e6291b6aa5c2033a36e0ac92077a55471e64f92

commit 4e6291b6aa5c2033a36e0ac92077a55471e64f92
Author: Michael Matz 
Date:   Tue Jul 9 17:27:37 2024 +0200

x86-ssw: tidy and commentary

Diff:
---
 gcc/config/i386/i386.cc | 310 
 gcc/config/i386/i386.h  |   1 +
 2 files changed, 101 insertions(+), 210 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 20f4dcd61870..8c9505d53a75 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -6970,7 +6970,7 @@ ix86_compute_frame_layout (void)
 }
 
   frame->save_regs_using_mov
-= (TARGET_PROLOGUE_USING_MOVE /*|| flag_shrink_wrap_separate*/) && 
m->use_fast_prologue_epilogue;
+= TARGET_PROLOGUE_USING_MOVE && m->use_fast_prologue_epilogue;
 
   /* Skip return address and error code in exception handler.  */
   offset = INCOMING_FRAME_SP_OFFSET;
@@ -7133,9 +7133,7 @@ ix86_compute_frame_layout (void)
   || (flag_stack_clash_protection
  && !ix86_target_stack_probe ()
  && to_allocate > get_probe_interval ()))
-{
-  frame->cannot_use_moves = true;
-}
+frame->cannot_use_moves = true;
 
   if ((!to_allocate && frame->nregs <= 1)
   || frame->cannot_use_moves)
@@ -9190,6 +9188,11 @@ ix86_expand_prologue (void)
   m->fs.cfa_reg == stack_pointer_rtx);
   else
{
+ /* Even when shrink-wrapping separately we call emit_prologue
+which will reset the frame-state with the expectation that
+we leave this routine with the state valid for the normal
+body of the function, i.e. reflecting allocated frame.
+So track this by hand.  */
  if (m->fs.cfa_reg == stack_pointer_rtx)
m->fs.cfa_offset -= allocate;
  m->fs.sp_offset += allocate;
@@ -10786,9 +10789,17 @@ ix86_live_on_entry (bitmap regs)
 }
 
 /* Separate shrink-wrapping.  */
+
+/* On x86 we have one component for each hardreg (a component is handled
+   if it's a callee saved register), and one additional component for
+   the frame allocation.  */
+
 #define NCOMPONENTS (FIRST_PSEUDO_REGISTER + 1)
 #define SW_FRAME FIRST_PSEUDO_REGISTER
 
+/* Returns false when we can't allocate the frame as a separate
+   component.  Otherwise return true.  */
+
 static bool
 separate_frame_alloc_p (void)
 {
@@ -10801,12 +10812,17 @@ separate_frame_alloc_p (void)
   return true;
 }
 
+/* Implements TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS.
+   Returns an sbitmap with all components that we intend to possibly
+   handle for the current function.  */
+
 static sbitmap
 ix86_get_separate_components (void)
 {
   struct machine_function *m = cfun->machine;
   struct ix86_frame *frame = &m->frame;
   sbitmap components;
+  unsigned min, max;
 
   ix86_finalize_stack_frame_flags ();
   if (frame->cannot_use_moves
@@ -10814,24 +10830,42 @@ ix86_get_separate_components (void)
   || cfun->machine->func_type != TYPE_NORMAL)
 return NULL;
 
+  min = max = INVALID_REGNUM;
+
   components = sbitmap_alloc (NCOMPONENTS);
   bitmap_clear (components);
 
   for (unsigned regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
 if (ix86_save_reg (regno, true, true))
   {
+   if (min == INVALID_REGNUM)
+ min = regno;
+   max = regno;
bitmap_set_bit (components, regno);
   }
 
+  if (max >= FIRST_PSEUDO_REGISTER)
+{
+  sbitmap_free (components);
+  return NULL;
+}
+
+  m->ssw_min_reg = min;
+  m->ssw_max_reg = max;
+
   if (separate_frame_alloc_p ())
 bitmap_set_bit (components, SW_FRAME);
 
   return components;
 }
 
+/* Implements TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB.  Given a BB
+   return all components that are necessary for it.  */
+
 static sbitmap
 ix86_components_for_bb (basic_block bb)
 {
+  struct machine_function *m = cfun->machine;
   bool need_frame = false;
   sbitmap components = sbitmap_alloc (NCOMPONENTS);
   bitmap_clear (components);
@@ -10840,7 +10874,7 @@ ix86_components_for_bb (basic_block bb)
   bitmap gen = &DF_LIVE_BB_INFO (bb)->gen;
   bitmap kill = &DF_LIVE_BB_INFO (bb)->kill;
 
-  for (unsigned regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+  for (unsigned regno = m->ssw_min_reg; regno <= m->ssw_max_reg; regno++)
 if (ix86_save_reg (regno, true, true)
&& (bitmap_bit_p (in, regno)
|| bitmap_bit_p (gen, regno)
@@ -10881,6 +10915,9 @@ ix86_components_for_bb (basic_block bb)
   return components;
 }
 
+/* Implements TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS.  Filter out
+   from COMPONENTS those that we can't handle on edge E.  */
+
 static void
 ix86_disqualify_components (sbitmap components, edge e, sbitmap, bool)
 {
@@ -10890,6 +10927,10 @@ ix86_disqualify_components (sbitmap components, edge 
e, sbitmap, bool)
 bitmap_clear_bit (components, SW_FRAME);
 }
 
+/* Helper for frame allocation.  This resets cfun->machine->fs to
+   reflect the state at the first 

[gcc(refs/users/matz/heads/x86-ssw)] Add target hook shrink_wrap.cleanup_components

2024-07-10 Thread Michael Matz via Gcc-cvs
https://gcc.gnu.org/g:826dd85cb9f368608a9890046cd701f7530d7315

commit 826dd85cb9f368608a9890046cd701f7530d7315
Author: Michael Matz 
Date:   Wed Jul 10 17:10:18 2024 +0200

Add target hook shrink_wrap.cleanup_components

when the shrink wrapping infrastructure removed components
the target might need to remove even more for dependency reasons.
x86 for instance needs to remove the frame-allocation component
when some register components are removed.

Diff:
---
 gcc/config/i386/i386.cc | 17 +
 gcc/doc/tm.texi |  8 
 gcc/doc/tm.texi.in  |  2 ++
 gcc/shrink-wrap.cc  | 10 ++
 gcc/target.def  | 10 ++
 5 files changed, 47 insertions(+)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 8c9505d53a75..36202b7dcb07 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -10927,6 +10927,21 @@ ix86_disqualify_components (sbitmap components, edge 
e, sbitmap, bool)
 bitmap_clear_bit (components, SW_FRAME);
 }
 
+/* Implements TARGET_SHRINK_WRAP_CLEANUP_COMPONENTS.  The infrastructure
+   has removed some components (noted in REMOVED), this cleans out any
+   further components that can't be shrink wrapped separately
+   anymore.  */
+
+static void
+ix86_cleanup_components (sbitmap components, sbitmap removed)
+{
+  /* If separate shrink wrapping removed any register components
+ then we must also removed SW_FRAME.  */
+  bitmap_clear_bit (removed, SW_FRAME);
+  if (!bitmap_empty_p (removed))
+bitmap_clear_bit (components, SW_FRAME);
+}
+
 /* Helper for frame allocation.  This resets cfun->machine->fs to
reflect the state at the first instruction before prologue (i.e.
the call just happened).  */
@@ -11107,6 +11122,8 @@ ix86_set_handled_components (sbitmap)
 #define TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB ix86_components_for_bb
 #undef TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS
 #define TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS ix86_disqualify_components
+#undef TARGET_SHRINK_WRAP_CLEANUP_COMPONENTS
+#define TARGET_SHRINK_WRAP_CLEANUP_COMPONENTS ix86_cleanup_components
 #undef TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS
 #define TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS 
ix86_emit_prologue_components
 #undef TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c8b8b126b242..201c8b9f94da 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5352,6 +5352,14 @@ components in @var{edge_components} that the target 
cannot handle on edge
 epilogue instead.
 @end deftypefn
 
+@deftypefn {Target Hook} void TARGET_SHRINK_WRAP_CLEANUP_COMPONENTS (sbitmap 
@var{components}, sbitmap @var{removed})
+This hook is called after the shrink wrapping infrastructure disqualified
+components for various reasons (e.g. because an unsplittable edge would
+have to be split).  If there are interdependencies between components the
+target can remove those from @var{components} whose dependencies are in
+@var{removed}.  If this hook would do nothing it doesn't need to be defined.
+@end deftypefn
+
 @deftypefn {Target Hook} void TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS 
(sbitmap)
 Emit prologue insns for the components indicated by the parameter.
 @end deftypefn
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 658e1e63371e..f23e6ff3e455 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3787,6 +3787,8 @@ generic code.
 
 @hook TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS
 
+@hook TARGET_SHRINK_WRAP_CLEANUP_COMPONENTS
+
 @hook TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS
 
 @hook TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS
diff --git a/gcc/shrink-wrap.cc b/gcc/shrink-wrap.cc
index 2bec492c2a57..db5c1f24d11c 100644
--- a/gcc/shrink-wrap.cc
+++ b/gcc/shrink-wrap.cc
@@ -1432,6 +1432,9 @@ disqualify_problematic_components (sbitmap components)
 {
   auto_sbitmap pro (SBITMAP_SIZE (components));
   auto_sbitmap epi (SBITMAP_SIZE (components));
+  auto_sbitmap old (SBITMAP_SIZE (components));
+
+  bitmap_copy (old, components);
 
   basic_block bb;
   FOR_EACH_BB_FN (bb, cfun)
@@ -1496,6 +1499,13 @@ disqualify_problematic_components (sbitmap components)
}
}
 }
+
+  /* If the target needs to know that we removed some components,
+ tell it.  */
+  bitmap_and_compl (old, old, components);
+  if (targetm.shrink_wrap.cleanup_components
+  && !bitmap_empty_p (old))
+targetm.shrink_wrap.cleanup_components (components, old);
 }
 
 /* Place code for prologues and epilogues for COMPONENTS where we can put
diff --git a/gcc/target.def b/gcc/target.def
index fdad7bbc93e2..ac26e8ed38d7 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -6872,6 +6872,16 @@ epilogue instead.",
  void, (sbitmap components, edge e, sbitmap edge_components, bool is_prologue),
  NULL)
 
+DEFHOOK
+(cleanup_components,
+ "This hook is called after the shrink wrapping infrastructure disqualified\n\
+components for various reasons (e.g. because an unsplittable ed

[gcc(refs/users/matz/heads/x86-ssw)] Revert "Add target hook shrink_wrap.cleanup_components"

2024-07-11 Thread Michael Matz via Gcc-cvs
https://gcc.gnu.org/g:3b04b651551abc541c6ec21835d2e85a407bb1c4

commit 3b04b651551abc541c6ec21835d2e85a407bb1c4
Author: Michael Matz 
Date:   Thu Jul 11 15:16:57 2024 +0200

Revert "Add target hook shrink_wrap.cleanup_components"

This reverts commit 826dd85cb9f368608a9890046cd701f7530d7315.

I found a better way to solve the problem.

Diff:
---
 gcc/config/i386/i386.cc | 17 -
 gcc/doc/tm.texi |  8 
 gcc/doc/tm.texi.in  |  2 --
 gcc/shrink-wrap.cc  | 10 --
 gcc/target.def  | 10 --
 5 files changed, 47 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 36202b7dcb07..8c9505d53a75 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -10927,21 +10927,6 @@ ix86_disqualify_components (sbitmap components, edge 
e, sbitmap, bool)
 bitmap_clear_bit (components, SW_FRAME);
 }
 
-/* Implements TARGET_SHRINK_WRAP_CLEANUP_COMPONENTS.  The infrastructure
-   has removed some components (noted in REMOVED), this cleans out any
-   further components that can't be shrink wrapped separately
-   anymore.  */
-
-static void
-ix86_cleanup_components (sbitmap components, sbitmap removed)
-{
-  /* If separate shrink wrapping removed any register components
- then we must also removed SW_FRAME.  */
-  bitmap_clear_bit (removed, SW_FRAME);
-  if (!bitmap_empty_p (removed))
-bitmap_clear_bit (components, SW_FRAME);
-}
-
 /* Helper for frame allocation.  This resets cfun->machine->fs to
reflect the state at the first instruction before prologue (i.e.
the call just happened).  */
@@ -11122,8 +11107,6 @@ ix86_set_handled_components (sbitmap)
 #define TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB ix86_components_for_bb
 #undef TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS
 #define TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS ix86_disqualify_components
-#undef TARGET_SHRINK_WRAP_CLEANUP_COMPONENTS
-#define TARGET_SHRINK_WRAP_CLEANUP_COMPONENTS ix86_cleanup_components
 #undef TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS
 #define TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS 
ix86_emit_prologue_components
 #undef TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 201c8b9f94da..c8b8b126b242 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5352,14 +5352,6 @@ components in @var{edge_components} that the target 
cannot handle on edge
 epilogue instead.
 @end deftypefn
 
-@deftypefn {Target Hook} void TARGET_SHRINK_WRAP_CLEANUP_COMPONENTS (sbitmap 
@var{components}, sbitmap @var{removed})
-This hook is called after the shrink wrapping infrastructure disqualified
-components for various reasons (e.g. because an unsplittable edge would
-have to be split).  If there are interdependencies between components the
-target can remove those from @var{components} whose dependencies are in
-@var{removed}.  If this hook would do nothing it doesn't need to be defined.
-@end deftypefn
-
 @deftypefn {Target Hook} void TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS 
(sbitmap)
 Emit prologue insns for the components indicated by the parameter.
 @end deftypefn
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f23e6ff3e455..658e1e63371e 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3787,8 +3787,6 @@ generic code.
 
 @hook TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS
 
-@hook TARGET_SHRINK_WRAP_CLEANUP_COMPONENTS
-
 @hook TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS
 
 @hook TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS
diff --git a/gcc/shrink-wrap.cc b/gcc/shrink-wrap.cc
index db5c1f24d11c..2bec492c2a57 100644
--- a/gcc/shrink-wrap.cc
+++ b/gcc/shrink-wrap.cc
@@ -1432,9 +1432,6 @@ disqualify_problematic_components (sbitmap components)
 {
   auto_sbitmap pro (SBITMAP_SIZE (components));
   auto_sbitmap epi (SBITMAP_SIZE (components));
-  auto_sbitmap old (SBITMAP_SIZE (components));
-
-  bitmap_copy (old, components);
 
   basic_block bb;
   FOR_EACH_BB_FN (bb, cfun)
@@ -1499,13 +1496,6 @@ disqualify_problematic_components (sbitmap components)
}
}
 }
-
-  /* If the target needs to know that we removed some components,
- tell it.  */
-  bitmap_and_compl (old, old, components);
-  if (targetm.shrink_wrap.cleanup_components
-  && !bitmap_empty_p (old))
-targetm.shrink_wrap.cleanup_components (components, old);
 }
 
 /* Place code for prologues and epilogues for COMPONENTS where we can put
diff --git a/gcc/target.def b/gcc/target.def
index ac26e8ed38d7..fdad7bbc93e2 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -6872,16 +6872,6 @@ epilogue instead.",
  void, (sbitmap components, edge e, sbitmap edge_components, bool is_prologue),
  NULL)
 
-DEFHOOK
-(cleanup_components,
- "This hook is called after the shrink wrapping infrastructure disqualified\n\
-components for various reasons (e.g. because an unsplittable edge would\n\
-have to be split).  If there are interdependencies between components the\n\
-target can remove those from @v

[gcc(refs/users/matz/heads/x86-ssw)] x86-ssw: Deal with deallocated frame in epilogue

2024-07-11 Thread Michael Matz via Gcc-cvs
https://gcc.gnu.org/g:fbf3ff6bc169639a2d55ab4ed5f962201ad6416e

commit fbf3ff6bc169639a2d55ab4ed5f962201ad6416e
Author: Michael Matz 
Date:   Thu Jul 11 15:21:05 2024 +0200

x86-ssw: Deal with deallocated frame in epilogue

When the frame is deallocated separately we need to adjust
frame_state.sp_offset to be correct before emitting the rest
of the standard epilogue.

Diff:
---
 gcc/config/i386/i386.cc | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 8c9505d53a75..847c6116884b 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -9931,6 +9931,11 @@ ix86_expand_epilogue (int style)
   else
 restore_regs_via_mov = false;
 
+  /* If we've (de)allocated the frame separately, then that's done already,
+ and SP is in fact at a word offset.  */
+  if (m->frame_alloc_separately)
+m->fs.sp_offset = UNITS_PER_WORD;
+
   if (restore_regs_via_mov || frame.nsseregs)
 {
   /* Ensure that the entire register save area is addressable via


[gcc/matz/heads/x86-ssw] x86: implement separate shrink wrapping

2024-07-16 Thread Michael Matz via Gcc-cvs
The branch 'matz/heads/x86-ssw' was updated to point to:

 298b1dd7fb81... x86: implement separate shrink wrapping

It previously pointed to:

 fbf3ff6bc169... x86-ssw: Deal with deallocated frame in epilogue

Diff:

!!! WARNING: THE FOLLOWING COMMITS ARE NO LONGER ACCESSIBLE (LOST):
---

  fbf3ff6... x86-ssw: Deal with deallocated frame in epilogue
  3b04b65... Revert "Add target hook shrink_wrap.cleanup_components"
  826dd85... Add target hook shrink_wrap.cleanup_components
  4e6291b... x86-ssw: tidy and commentary
  495a687... x86-ssw: Adjust testcase
  d213bc5... x86-ssw: precise using of moves
  cf6d794... x86-ssw: adjust testcase
  c5a72cc... x86-ssw: fix testcases
  f917195... x86-ssw: disable if DRAP reg is needed
  5a9a70a... x86-ssw: don't clobber flags
  eb94eb7... x86: implement separate shrink wrapping


Summary of changes (added commits):
---

  298b1dd... x86: implement separate shrink wrapping


[gcc(refs/users/matz/heads/x86-ssw)] x86: implement separate shrink wrapping

2024-07-16 Thread Michael Matz via Gcc-cvs
https://gcc.gnu.org/g:298b1dd7fb8189eb22ae604973083ae80b135ae7

commit 298b1dd7fb8189eb22ae604973083ae80b135ae7
Author: Michael Matz 
Date:   Sun Jun 30 03:52:39 2024 +0200

x86: implement separate shrink wrapping

this adds support for the infrastructure for shrink wrapping
separate components to the x86 target.  The components we track
are individual registers to save/restore and the frame allocation
itself.

There are various limitations where we give up:
* when the frame becomes too large
* when any complicated realignment is needed (DRAP or not)
* when the calling convention requires certain forms of
  pro- or epilogues (e.g. SEH on win64)
* when the function is "special" (uses eh_return and the like);
  most of that is already avoided by the generic infrastructure
  in shrink-wrap.cc
* when we must not use moves to save/restore registers for any reasons
  (stack checking being one notable one)
and so on.

For the last point we now differ between not being able to use moves
(then we disable separate shrink wrapping) and merely not wanting to use
moves (e.g. because push/pop is equally fast).  In the latter case we
don't disable separate shrink wrapping, but do use moves for those
functions where it does something.

Apart from that it's fairly straight forward: for components selected
by the infrastructure to be separately shrink-wrapped emit code to
save/restore them in the appropriate hook (for the frame-alloc
component to adjust the stack pointer), remember them, and don't emit
any code for those in the normal expand_prologue and expand_epilogue
expanders.  But as the x86 prologue and epilogue generators are quite
a twisty maze with many cases to deal with this also adds some aborts
and asserts for things that are unexpected.

The static instruction count of functions can increase (when
separate shrink wrapping emits some component sequences into multiple
block) and the instructions itself can become larger (moves vs.
push/pop), so there's a code size increase for functions where this
does something.  The dynamic insn count decreases for at least one
path through the function (and doesn't increase for others).

Two testcases need separate shrink wrapping disabled because they
check for specific generated assembly instruction counts and sequences
or specific messages in the pro_and_epilogue dump file, which turn out
different with separate shrink wrapping.

gcc/
* config/i386/i386.h (struct i86_frame.cannot_use_moves):
Add member.
(struct machine_function.ssw_min_reg,
ssw_max_reg, reg_wrapped_separately, frame_alloc_separately,
anything_separately): Add members.
* config/i386/i386.cc (ix86_compute_frame_layout): Split out
cannot_use_moves from save_regs_using_move computation.
(ix_86_emit_save_regs): Ensure not using this under separate
shrink wrapping.
(ix86_emit_save_regs_using_mov, ix86_emit_save_sse_regs_using_mov,
ix86_emit_restore_reg_using_pop, ix86_emit_restore_reg_using_pop2,
ix86_emit_restore_regs_using_pop): Don't handle separately shrink
wrapped components.
(ix86_expand_prologue): Handle separate shrink wrapping.
(ix86_emit_restore_reg_using_mov): New function, split out
from ...
(ix86_emit_restore_regs_using_mov): ... here and ...
(ix86_emit_restore_sse_regs_using_mov): ... here.
(ix86_expand_epilogue): Handle separate shrink wrapping.
(NCOMPONENTS, SW_FRAME): Add new defines.
(separate_frame_alloc_p, ix86_get_separate_components,
ix86_components_for_bb, ix86_disqualify_components,
ix86_init_frame_state, ix86_alloc_frame, ix86_dealloc_frame,
ix86_process_reg_components, ix86_emit_prologue_components,
ix86_emit_epilogue_components, ix86_set_handled_components):
Add new functions.
(TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS,
TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB,
TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS,
TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS,
TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS,
TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Define target hook
macros.

gcc/testsuite
* gcc.dg/stack-check-5.c: Disable separate shrink wrapping.
* gcc.target/x86_64/abi/callabi/leaf-2.c: Ditto.

Diff:
---
 gcc/config/i386/i386.cc| 491 ++---
 gcc/config/i386/i386.h |   5 +
 gcc/testsuite/gcc.dg/stack-check-5.c   |   2 +-
 .../gcc.target/x86_64/abi/callabi/leaf-2.c |   2 +-
 4 files changed, 447 insertions(+)

[gcc/matz/heads/x86-ssw] x86: Implement separate shrink wrapping

2024-07-16 Thread Michael Matz via Gcc-cvs
The branch 'matz/heads/x86-ssw' was updated to point to:

 f0d9a4c9d44c... x86: Implement separate shrink wrapping

It previously pointed to:

 298b1dd7fb81... x86: implement separate shrink wrapping

Diff:

!!! WARNING: THE FOLLOWING COMMITS ARE NO LONGER ACCESSIBLE (LOST):
---

  298b1dd... x86: implement separate shrink wrapping


Summary of changes (added commits):
---

  f0d9a4c... x86: Implement separate shrink wrapping


[gcc(refs/users/matz/heads/x86-ssw)] x86: Implement separate shrink wrapping

2024-07-16 Thread Michael Matz via Gcc-cvs
https://gcc.gnu.org/g:f0d9a4c9d44c463f86699d7f054722d5d0a20d09

commit f0d9a4c9d44c463f86699d7f054722d5d0a20d09
Author: Michael Matz 
Date:   Sun Jun 30 03:52:39 2024 +0200

x86: Implement separate shrink wrapping

this adds support for the infrastructure for shrink wrapping
separate components to the x86 target.  The components we track
are individual registers to save/restore and the frame allocation
itself.

There are various limitations where we give up:
* when the frame becomes too large
* when any complicated realignment is needed (DRAP or not)
* when the calling convention requires certain forms of
  pro- or epilogues (e.g. SEH on win64)
* when the function is "special" (uses eh_return and the like);
  most of that is already avoided by the generic infrastructure
  in shrink-wrap.cc
* when we must not use moves to save/restore registers for any reasons
  (stack checking being one notable one)
and so on.

For the last point we now differ between not being able to use moves
(then we disable separate shrink wrapping) and merely not wanting to use
moves (e.g. because push/pop is equally fast).  In the latter case we
don't disable separate shrink wrapping, but do use moves for those
functions where it does something.

Apart from that it's fairly straight forward: for components selected
by the infrastructure to be separately shrink-wrapped emit code to
save/restore them in the appropriate hook (for the frame-alloc
component to adjust the stack pointer), remember them, and don't emit
any code for those in the normal expand_prologue and expand_epilogue
expanders.  But as the x86 prologue and epilogue generators are quite
a twisty maze with many cases to deal with this also adds some aborts
and asserts for things that are unexpected.

The static instruction count of functions can increase (when
separate shrink wrapping emits some component sequences into multiple
block) and the instructions itself can become larger (moves vs.
push/pop), so there's a code size increase for functions where this
does something.  The dynamic insn count decreases for at least one
path through the function (and doesn't increase for others).

Two testcases need separate shrink wrapping disabled because they
check for specific generated assembly instruction counts and sequences
or specific messages in the pro_and_epilogue dump file, which turn out
different with separate shrink wrapping.

gcc/
* config/i386/i386.h (struct i86_frame.cannot_use_moves):
Add member.
(struct machine_function.ssw_min_reg,
ssw_max_reg, reg_wrapped_separately, frame_alloc_separately,
anything_separately): Add members.
* config/i386/i386.cc (ix86_compute_frame_layout): Split out
cannot_use_moves from save_regs_using_move computation.
(ix_86_emit_save_regs): Ensure not using this under separate
shrink wrapping.
(ix86_emit_save_regs_using_mov, ix86_emit_save_sse_regs_using_mov,
ix86_emit_restore_reg_using_pop, ix86_emit_restore_reg_using_pop2,
ix86_emit_restore_regs_using_pop): Don't handle separately shrink
wrapped components.
(ix86_expand_prologue): Handle separate shrink wrapping.
(ix86_emit_restore_reg_using_mov): New function, split out
from ...
(ix86_emit_restore_regs_using_mov): ... here and ...
(ix86_emit_restore_sse_regs_using_mov): ... here.
(ix86_expand_epilogue): Handle separate shrink wrapping.
(NCOMPONENTS, SW_FRAME): Add new defines.
(separate_frame_alloc_p, ix86_get_separate_components,
ix86_components_for_bb, ix86_disqualify_components,
ix86_init_frame_state, ix86_alloc_frame, ix86_dealloc_frame,
ix86_process_reg_components, ix86_emit_prologue_components,
ix86_emit_epilogue_components, ix86_set_handled_components):
Add new functions.
(TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS,
TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB,
TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS,
TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS,
TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS,
TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Define target hook
macros.

gcc/testsuite/
* gcc.dg/stack-check-5.c: Disable separate shrink wrapping.
* gcc.target/x86_64/abi/callabi/leaf-2.c: Ditto.

Diff:
---
 gcc/config/i386/i386.cc| 491 ++---
 gcc/config/i386/i386.h |   5 +
 gcc/testsuite/gcc.dg/stack-check-5.c   |   2 +-
 .../gcc.target/x86_64/abi/callabi/leaf-2.c |   2 +-
 4 files changed, 447 insertions(+

[gcc(refs/users/matz/heads/x86-ssw)] x86: Implement separate shrink wrapping

2024-07-16 Thread Michael Matz via Gcc-cvs
https://gcc.gnu.org/g:86676836d6cb8289c53ff3dffcf8583505a7e0f5

commit 86676836d6cb8289c53ff3dffcf8583505a7e0f5
Author: Michael Matz 
Date:   Sun Jun 30 03:52:39 2024 +0200

x86: Implement separate shrink wrapping

this adds support for the infrastructure for shrink wrapping
separate components to the x86 target.  The components we track
are individual registers to save/restore and the frame allocation
itself.

There are various limitations where we give up:
* when the frame becomes too large
* when any complicated realignment is needed (DRAP or not)
* when the calling convention requires certain forms of
  pro- or epilogues (e.g. SEH on win64)
* when the function is "special" (uses eh_return and the like);
  most of that is already avoided by the generic infrastructure
  in shrink-wrap.cc
* when we must not use moves to save/restore registers for any reasons
  (stack checking being one notable one)
and so on.

For the last point we now differ between not being able to use moves
(then we disable separate shrink wrapping) and merely not wanting to use
moves (e.g. because push/pop is equally fast).  In the latter case we
don't disable separate shrink wrapping, but do use moves for those
functions where it does something.

Apart from that it's fairly straight forward: for components selected
by the infrastructure to be separately shrink-wrapped emit code to
save/restore them in the appropriate hook (for the frame-alloc
component to adjust the stack pointer), remember them, and don't emit
any code for those in the normal expand_prologue and expand_epilogue
expanders.  But as the x86 prologue and epilogue generators are quite
a twisty maze with many cases to deal with this also adds some aborts
and asserts for things that are unexpected.

The static instruction count of functions can increase (when
separate shrink wrapping emits some component sequences into multiple
block) and the instructions itself can become larger (moves vs.
push/pop), so there's a code size increase for functions where this
does something.  The dynamic insn count decreases for at least one
path through the function (and doesn't increase for others).

Two testcases need separate shrink wrapping disabled because they
check for specific generated assembly instruction counts and sequences
or specific messages in the pro_and_epilogue dump file, which turn out
different with separate shrink wrapping.

gcc/
* config/i386/i386.h (struct i86_frame.cannot_use_moves):
Add member.
(struct machine_function.ssw_min_reg,
ssw_max_reg, reg_wrapped_separately, frame_alloc_separately,
anything_separately): Add members.
* config/i386/i386.cc (ix86_compute_frame_layout): Split out
cannot_use_moves from save_regs_using_move computation.
(ix_86_emit_save_regs): Ensure not using this under separate
shrink wrapping.
(ix86_emit_save_regs_using_mov, ix86_emit_save_sse_regs_using_mov,
ix86_emit_restore_reg_using_pop, ix86_emit_restore_reg_using_pop2,
ix86_emit_restore_regs_using_pop): Don't handle separately shrink
wrapped components.
(ix86_expand_prologue): Handle separate shrink wrapping.
(ix86_emit_restore_reg_using_mov): New function, split out
from ...
(ix86_emit_restore_regs_using_mov): ... here and ...
(ix86_emit_restore_sse_regs_using_mov): ... here.
(ix86_expand_epilogue): Handle separate shrink wrapping.
(NCOMPONENTS, SW_FRAME): Add new defines.
(separate_frame_alloc_p, ix86_get_separate_components,
ix86_components_for_bb, ix86_disqualify_components,
ix86_init_frame_state, ix86_alloc_frame, ix86_dealloc_frame,
ix86_process_reg_components, ix86_emit_prologue_components,
ix86_emit_epilogue_components, ix86_set_handled_components):
Add new functions.
(TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS,
TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB,
TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS,
TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS,
TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS,
TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Define target hook
macros.

gcc/testsuite/
* gcc.dg/stack-check-5.c: Disable separate shrink wrapping.
* gcc.target/x86_64/abi/callabi/leaf-2.c: Ditto.

Diff:
---
 gcc/config/i386/i386.cc| 491 ++---
 gcc/config/i386/i386.h |   5 +
 gcc/testsuite/gcc.dg/stack-check-5.c   |   2 +-
 .../gcc.target/x86_64/abi/callabi/leaf-2.c |   2 +-
 4 files changed, 447 insertions(+

[gcc r15-4242] Fix PR116650: check all regs in regrename targets

2024-10-10 Thread Michael Matz via Gcc-cvs
https://gcc.gnu.org/g:85bee4f77b1b0ebe68b3efe0c356b7d5fb006c4d

commit r15-4242-g85bee4f77b1b0ebe68b3efe0c356b7d5fb006c4d
Author: Michael Matz 
Date:   Thu Oct 10 16:36:51 2024 +0200

Fix PR116650: check all regs in regrename targets

(this came up for m68k vs. LRA, but is a generic problem)

Regrename wants to use new registers for certain def-use chains.
For validity of replacements it needs to check that the selected
candidates are unused up to then.  That's done in check_new_reg_p.
But if it so happens that the new register needs more hardregs
than the old register (which happens if the target allows inter-bank
moves and the mode is something like a DFmode that needs to be placed
into a SImode reg-pair), then check_new_reg_p only checks the
first of those registers for free-ness.

This is caused by that function looking up the number of necessary
hardregs only in terms of the old hardreg number.  It of course needs
to do that in terms of the new candidate regnumber.  The symptom is that
regrename sometimes clobbers the higher numbered registers of such a
regrename target pair.  This patch fixes that problem.

(In the particular case of the bug report it was LRA that left over a
inter-bank move instruction that triggers regrename, ultimately causing
the mis-compile.  Reload didn't do that, but in general we of course
can't rely on such moves not happening if the target allows them.)

This also shows a general confusion in that function and the target hook
interface here:

  for (i = nregs - 1; i >= 0; --)
...
|| ! HARD_REGNO_RENAME_OK (reg + i, new_reg + i))

it uses nregs in a way that requires it to be the same between old and
new register.  The problem is that the target hook only gets register
numbers, when it instead should get a mode and register numbers and
would be called only for the first but not for subsequent registers.
I've looked at a number of definitions of that target hook and I think
that this is currently harmless in the sense that it would merely rule
out some potential reg-renames that would in fact be okay to do.  So I'm
not changing the target hook interface here and hence that problem
remains unfixed.

PR rtl-optimization/116650
* regrename.cc (check_new_reg_p): Calculate nregs in terms of
the new candidate register.

Diff:
---
 gcc/regrename.cc | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/gcc/regrename.cc b/gcc/regrename.cc
index 054e601740b1..22668d7bf57d 100644
--- a/gcc/regrename.cc
+++ b/gcc/regrename.cc
@@ -324,10 +324,27 @@ static bool
 check_new_reg_p (int reg ATTRIBUTE_UNUSED, int new_reg,
 class du_head *this_head, HARD_REG_SET this_unavailable)
 {
-  int nregs = this_head->nregs;
+  int nregs = 1;
   int i;
   struct du_chain *tmp;
 
+  /* See whether new_reg accepts all modes that occur in
+ definition and uses and record the number of regs it would take.  */
+  for (tmp = this_head->first; tmp; tmp = tmp->next_use)
+{
+  int n;
+  /* Completely ignore DEBUG_INSNs, otherwise we can get
+-fcompare-debug failures.  */
+  if (DEBUG_INSN_P (tmp->insn))
+   continue;
+
+  if (!targetm.hard_regno_mode_ok (new_reg, GET_MODE (*tmp->loc)))
+   return false;
+  n = hard_regno_nregs (new_reg, GET_MODE (*tmp->loc));
+  if (n > nregs)
+   nregs = n;
+}
+
   for (i = nregs - 1; i >= 0; --i)
 if (TEST_HARD_REG_BIT (this_unavailable, new_reg + i)
|| fixed_regs[new_reg + i]
@@ -348,14 +365,10 @@ check_new_reg_p (int reg ATTRIBUTE_UNUSED, int new_reg,
  definition and uses.  */
   for (tmp = this_head->first; tmp; tmp = tmp->next_use)
 {
-  /* Completely ignore DEBUG_INSNs, otherwise we can get
--fcompare-debug failures.  */
   if (DEBUG_INSN_P (tmp->insn))
continue;
 
-  if (!targetm.hard_regno_mode_ok (new_reg, GET_MODE (*tmp->loc))
- || call_clobbered_in_chain_p (this_head, GET_MODE (*tmp->loc),
-   new_reg))
+  if (call_clobbered_in_chain_p (this_head, GET_MODE (*tmp->loc), new_reg))
return false;
 }


[gcc r15-8262] doc: regenerate rs6000/rs6000.opt.urls

2025-03-18 Thread Michael Matz via Gcc-cvs
https://gcc.gnu.org/g:8333f1c7e699419a4e428fa1d66156d7bad69c9f

commit r15-8262-g8333f1c7e699419a4e428fa1d66156d7bad69c9f
Author: Michael Matz 
Date:   Tue Mar 18 17:21:23 2025 +0100

doc: regenerate rs6000/rs6000.opt.urls

which I forgot and the autobuilder complained.

* config/rs6000/rs6000.opt.urls: Regenerate.

Diff:
---
 gcc/config/rs6000/rs6000.opt.urls | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.opt.urls 
b/gcc/config/rs6000/rs6000.opt.urls
index c7c1cefe22cd..0b418c09a083 100644
--- a/gcc/config/rs6000/rs6000.opt.urls
+++ b/gcc/config/rs6000/rs6000.opt.urls
@@ -98,6 +98,9 @@ 
UrlSuffix(gcc/RS_002f6000-and-PowerPC-Options.html#index-mminimal-toc)
 mfull-toc
 UrlSuffix(gcc/RS_002f6000-and-PowerPC-Options.html#index-mfull-toc)
 
+msplit-patch-nops
+UrlSuffix(gcc/RS_002f6000-and-PowerPC-Options.html#index-msplit-patch-nops)
+
 mvrsave
 UrlSuffix(gcc/RS_002f6000-and-PowerPC-Options.html#index-mvrsave)


[gcc r15-8236] rs6000: Add -msplit-patch-nops (PR112980)

2025-03-17 Thread Michael Matz via Gcc-cvs
https://gcc.gnu.org/g:96698551b8e19fc33637908190f121e039301993

commit r15-8236-g96698551b8e19fc33637908190f121e039301993
Author: Michael Matz 
Date:   Wed Nov 13 16:04:06 2024 +0100

rs6000: Add -msplit-patch-nops (PR112980)

as the bug report details some uses of -fpatchable-function-entry
aren't happy with the "before" NOPs being inserted between global and
local entry point on powerpc.  We want the before NOPs be in front
of the global entry point.  That means that the patching NOPs aren't
consecutive for dual entry point functions, but for these usecases
that's not the problem.  But let us support both under the control
of a new target option: -msplit-patch-nops.

gcc/

PR target/112980
* config/rs6000/rs6000.opt (msplit-patch-nops): New option.
* doc/invoke.texi (RS/6000 and PowerPC Options): Document it.
* config/rs6000/rs6000.h (machine_function.stop_patch_area_print):
New member.
* config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry):
Emit split nops under control of that one.
* config/rs6000/rs6000-logue.cc (rs6000_output_function_prologue):
Add handling of split patch nops.

Diff:
---
 gcc/config/rs6000/rs6000-logue.cc | 15 +--
 gcc/config/rs6000/rs6000.cc   | 27 +++
 gcc/config/rs6000/rs6000.h|  6 ++
 gcc/config/rs6000/rs6000.opt  |  4 
 gcc/doc/invoke.texi   | 17 +++--
 5 files changed, 57 insertions(+), 12 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-logue.cc 
b/gcc/config/rs6000/rs6000-logue.cc
index aa07d79d9742..52f44b114b06 100644
--- a/gcc/config/rs6000/rs6000-logue.cc
+++ b/gcc/config/rs6000/rs6000-logue.cc
@@ -4005,8 +4005,8 @@ rs6000_output_function_prologue (FILE *file)
 
   unsigned short patch_area_size = crtl->patch_area_size;
   unsigned short patch_area_entry = crtl->patch_area_entry;
-  /* Need to emit the patching area.  */
-  if (patch_area_size > 0)
+  /* Emit non-split patching area now.  */
+  if (!TARGET_SPLIT_PATCH_NOPS && patch_area_size > 0)
{
  cfun->machine->global_entry_emitted = true;
  /* As ELFv2 ABI shows, the allowable bytes between the global
@@ -4027,7 +4027,6 @@ rs6000_output_function_prologue (FILE *file)
   patch_area_entry);
  rs6000_print_patchable_function_entry (file, patch_area_entry,
 true);
- patch_area_size -= patch_area_entry;
}
}
 
@@ -4037,9 +4036,13 @@ rs6000_output_function_prologue (FILE *file)
   assemble_name (file, name);
   fputs ("\n", file);
   /* Emit the nops after local entry.  */
-  if (patch_area_size > 0)
-   rs6000_print_patchable_function_entry (file, patch_area_size,
-  patch_area_entry == 0);
+  if (patch_area_size > patch_area_entry)
+   {
+ patch_area_size -= patch_area_entry;
+ cfun->machine->stop_patch_area_print = false;
+ rs6000_print_patchable_function_entry (file, patch_area_size,
+patch_area_entry == 0);
+   }
 }
 
   else if (rs6000_pcrel_p ())
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 675b039c2b65..737c3d6f7c75 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15245,11 +15245,25 @@ rs6000_print_patchable_function_entry (FILE *file,
 {
   bool global_entry_needed_p = rs6000_global_entry_point_prologue_needed_p ();
   /* For a function which needs global entry point, we will emit the
- patchable area before and after local entry point under the control of
- cfun->machine->global_entry_emitted, see the handling in function
- rs6000_output_function_prologue.  */
-  if (!global_entry_needed_p || cfun->machine->global_entry_emitted)
+ patchable area when it isn't split before and after local entry point
+ under the control of cfun->machine->global_entry_emitted, see the
+ handling in function rs6000_output_function_prologue.  */
+  if (!TARGET_SPLIT_PATCH_NOPS
+  && (!global_entry_needed_p || cfun->machine->global_entry_emitted))
 default_print_patchable_function_entry (file, patch_area_size, record_p);
+
+  /* For split patch nops we emit the before nops (from generic code)
+ in front of the global entry point and after the local entry point,
+ under the control of cfun->machine->stop_patch_area_print, see
+ rs6000_output_function_prologue and rs6000_elf_declare_function_name.  */
+  if (TARGET_SPLIT_PATCH_NOPS)
+{
+  if (!cfun->machine->stop_patch_area_print)
+   default_print_patchable_function_entry (file, patch_area_size,
+   record_p);
+  else
+   gcc_assert (global_entry_need