gcc-7-20170205 is now available

2017-02-05 Thread gccadmin
Snapshot gcc-7-20170205 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/7-20170205/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 7 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 245199

You'll find:

 gcc-7-20170205.tar.bz2   Complete GCC

  MD5=a84e33588621ce7ccfd7f6882284c9be
  SHA1=c6a0cc614ad6f6cec5de6dae9c3567e28861aefa

Diffs from 7-20170129 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-7
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


[RFC PATCH v3 0/8][i386] Use out-of-line stubs for ms_abi pro/epilogues

2017-02-05 Thread Daniel Santos

Uros or Jan,
Please take this as a ping, as I never bothered pinging after submitting 
v2 since I found a few more issues with it. :)  Although I realize this 
would be a GCC 8 stage 1 item, I would like to try to get it finished up 
and tentatively approved asap.  I have tried to summarize this patch set 
as clearly and succinctly below as possible.  Thanks!


 * This patch set depends upon the "Use aligned SSE movs for re-aligned
   MS ABI pro/epilogues" patch set:
   https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01859.html
 * I have submitted a test program submitted separately:
   https://gcc.gnu.org/ml/gcc/2017-02/msg9.html


Summary
===

When a 64-bit Microsoft function calls and System V function, ABI 
differences requires RSI, RDI and XMM6-15 to be considered as 
clobbered.  Saving these registers inline can cost as much as 109 bytes 
and a similar amount for restoring. This patch set targets 64-bit Wine 
and aims to mitigate some of these costs by adding ms/sysv save & 
restore stubs to libgcc, which are called from pro/epilogues rather than 
emitting the code inline.  And since we're already tinkering with stubs, 
they will also manages the save/restore of all remaining registers if 
possible.  Analysis of building Wine 64 demonstrates a reduction of 
.text by around 20%, which also translates into a reduction of Wine's 
install size by 34MiB.


As there will usually only be 3 stubs in memory at any time, I'm using 
the larger mov instructions instead of push/pop to facilitate better 
parallelization. The basic theory is that the combination of better 
parallelization and reduced I-cache misses will offset the extra 
instructions required for implementation, although I have not produced 
actual performance data yet.


For now, I have called this feature -moutline-msabi-xlogues, but Sandra 
Loosemore has this suggestion: 
(https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02670.html)



Just as a suggestion (I'm not an i386 maintainer), I'd recommend
spelling the name of this option -mno-inline-msabi-xlogues instead of
-moutline-msabi-xlogues, and making the default -minline-msabi-xlogues.


When enabled, the feature is activated when an ms_abi function calls a 
sysv_abi function if the following is true (evaluated in 
ix86_compute_frame_layout):


TARGET_SSE
&& !ix86_function_ms_hook_prologue (current_function_decl)
&& !SEH
&& !crtl->calls_eh_return
&& !ix86_static_chain_on_stack
&& !ix86_using_red_zone ()
&& !flag_split_stack

Some of these, like __builtin_eh_return, might be easy to add but I 
don't have a test for them.



StackLayout


When active, registers are saved on the stack differently. Note that 
when not active, stack layout is *unchanged*.


[arguments]
<- ARG_POINTER
saved pc

saved frame pointer if frame_pointer_needed
<- HARD_FRAME_POINTER
[saved regs]if not managed by stub, (e.g. explicitly 
clobbered)

<- reg_save_offset
[padding0]
<- stack_realign_offset
<- Start of out-of-line, stub-managed regs
XMM6-15
RSI
RDI
[RBX]   if RBX is clobbered
[RBP]   if RBP and RBX are clobbered and HFP not used.
[R12]   if R12 and all previous regs are clobbered
[R13]   if R13 and all previous regs are clobbered
[R14]   if R14 and all previous regs are clobbered
[R15]   if R15 and all previous regs are clobbered
<- end of stub-saved/restored regs
[padding1]
<- outlined_save_offset
<- sse_regs_save_offset
[padding2]
<- FRAME_POINTER
[va_arg registers]

[frame]
... etc.


Stubs
=

There are two sets of stubs for use with and without hard frame 
pointers.  Each set has a save, a restore and a restore-as-tail-call 
that performs the function's return.  Each stub has entry points for the 
number of registers it's saving. The non-tail-call restore is used when 
a sibling call is the tail.  If a normal register is explicitly 
clobbered out of the order that hard registers are usually assigned in 
(e.g., __asm__ __volatile__ ("":::"r15")), then that register will be 
saved and restored as normal and not by the stub.


Stub names:
__savms64_(12-18)
__resms64_(12-18)
__resms64x_(12-18)

__savms64f_(12-17)
__resms64f_(12-17)
__resms64fx_(12-17)

Save stubs use RAX as a base register and restore stubs use RSI, the 
later which is overwritten before returning.  Restore-as-tail-call for 
the non-HFP case uses R10 to restore the stack pointer before returning.


Samples
===

Standard case with RBX, RBP and R12 also being used in function:

  Prologue:
lea-0x78(%rsp),%rax
sub$0x108,%rsp
callq  5874b <__savms

[PATCH 6/8] [i386] Add patterns and predicates foutline-msabi-xlouges

2017-02-05 Thread Daniel Santos
Adds the predicates save_multiple and restore_multiple to predicates.md,
which are used by following patterns in sse.md:

* save_multiple - insn that calls a save stub
* restore_multiple - call_insn that calls a save stub and returns to the
  function to allow a sibling call (which should typically offer better
  optimization than the restore stub as the tail call)
* restore_multiple_and_return - a jump_insn that returns from the
  function as a tail-call.
* restore_multiple_leave_return - like the above, but restores the frame
  pointer before returning.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/predicates.md | 155 ++
 gcc/config/i386/sse.md|  37 ++
 2 files changed, 192 insertions(+)

diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 8f250a2e720..36fe8abc3f4 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -1657,3 +1657,158 @@
   (ior (match_operand 0 "register_operand")
(and (match_code "const_int")
(match_test "op == constm1_rtx"
+
+;; Return true if:
+;; 1. first op is a symbol reference,
+;; 2. >= 13 operands, and
+;; 3. operands 2 to end is one of:
+;;   a. save a register to a memory location, or
+;;   b. restore stack pointer.
+(define_predicate "save_multiple"
+  (match_code "parallel")
+{
+  const unsigned nregs = XVECLEN (op, 0);
+  rtx head = XVECEXP (op, 0, 0);
+  unsigned i;
+
+  if (GET_CODE (head) != USE)
+return false;
+  else
+{
+  rtx op0 = XEXP (head, 0);
+  if (op0 == NULL_RTX || GET_CODE (op0) != SYMBOL_REF)
+   return false;
+}
+
+  if (nregs < 13)
+return false;
+
+  for (i = 2; i < nregs; i++)
+{
+  rtx e, src, dest;
+
+  e = XVECEXP (op, 0, i);
+
+  switch (GET_CODE (e))
+   {
+ case SET:
+   src  = SET_SRC (e);
+   dest = SET_DEST (e);
+
+   /* storing a register to memory.  */
+   if (GET_CODE (src) == REG && GET_CODE (dest) == MEM)
+ {
+   rtx addr = XEXP (dest, 0);
+
+   /* Good if dest address is in RAX.  */
+   if (GET_CODE (addr) == REG
+   && REGNO (addr) == AX_REG)
+ continue;
+
+   /* Good if dest address is offset of RAX.  */
+   if (GET_CODE (addr) == PLUS
+   && GET_CODE (XEXP (addr, 0)) == REG
+   && REGNO (XEXP (addr, 0)) == AX_REG)
+ continue;
+ }
+   break;
+
+ default:
+   break;
+   }
+   return false;
+}
+  return true;
+})
+
+;; Return true if:
+;; * first op is (return) or a a use (symbol reference),
+;; * >= 14 operands, and
+;; * operands 2 to end are one of:
+;;   - restoring a register from a memory location that's an offset of RSI.
+;;   - clobbering a reg
+;;   - adjusting SP
+(define_predicate "restore_multiple"
+  (match_code "parallel")
+{
+  const unsigned nregs = XVECLEN (op, 0);
+  rtx head = XVECEXP (op, 0, 0);
+  unsigned i;
+
+  switch (GET_CODE (head))
+{
+  case RETURN:
+   i = 3;
+   break;
+
+  case USE:
+  {
+   rtx op0 = XEXP (head, 0);
+
+   if (op0 == NULL_RTX || GET_CODE (op0) != SYMBOL_REF)
+ return false;
+
+   i = 1;
+   break;
+  }
+
+  default:
+   return false;
+}
+
+  if (nregs < i + 12)
+return false;
+
+  for (; i < nregs; i++)
+{
+  rtx e, src, dest;
+
+  e = XVECEXP (op, 0, i);
+
+  switch (GET_CODE (e))
+   {
+ case CLOBBER:
+   continue;
+
+ case SET:
+   src  = SET_SRC (e);
+   dest = SET_DEST (e);
+
+   /* Restoring a register from memory.  */
+   if (GET_CODE (src) == MEM && GET_CODE (dest) == REG)
+ {
+   rtx addr = XEXP (src, 0);
+
+   /* Good if src address is in RSI.  */
+   if (GET_CODE (addr) == REG
+   && REGNO (addr) == SI_REG)
+ continue;
+
+   /* Good if src address is offset of RSI.  */
+   if (GET_CODE (addr) == PLUS
+   && GET_CODE (XEXP (addr, 0)) == REG
+   && REGNO (XEXP (addr, 0)) == SI_REG)
+ continue;
+
+   /* Good if adjusting stack pointer.  */
+   if (GET_CODE (dest) == REG
+   && REGNO (dest) == SP_REG
+   && GET_CODE (src) == PLUS
+   && GET_CODE (XEXP (src, 0)) == REG
+   && REGNO (XEXP (src, 0)) == SP_REG)
+ continue;
+ }
+
+   /* Restoring stack pointer from another register.  */
+   if (GET_CODE (dest) == REG && REGNO (dest) == SP_REG
+   && GET_CODE (src) == REG)
+ continue;
+   break;
+
+ default:
+   break;
+   }
+   return false;
+}
+  return true;
+})
diff --git a/gcc/config

[PATCH 3/8] [i386] Adds class xlouge_layout and new fields to struct machine_function

2017-02-05 Thread Daniel Santos
Of the new fields added to struct machine_function, outline_ms_sysv is
initially set in ix86_expand_call, but may later be cleared when
ix86_compute_frame_layout is called (both of these are in subsequent
patch).  If it is not cleared, then the remaining new fields will be
set.

The new class xlouge_layout manages the layout of the stack area used by
the out-of-line save & restore stubs as well as any padding needed
before and after the save area.  It also provides the proper symbol rtx
for the requested stub based upon values of the new fields in struct
machine_function.

xlouge_layout cannot be used until stack realign flags are finalized and
ix86_compute_frame_layout is called, at which point
xlouge_layout::get_instance may be used to retrieve the appropriate
(constant) instance of xlouge_layout.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 252 +
 gcc/config/i386/i386.h |  18 
 2 files changed, 270 insertions(+)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 9a0dfdc77ba..663a8c1b1ed 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -91,6 +91,7 @@ static rtx legitimize_dllimport_symbol (rtx, bool);
 static rtx legitimize_pe_coff_extern_decl (rtx, bool);
 static rtx legitimize_pe_coff_symbol (rtx, bool);
 static void ix86_print_operand_address_as (FILE *, rtx, addr_space_t, bool);
+static bool ix86_save_reg (unsigned int, bool, bool);
 
 #ifndef CHECK_STACK_LIMIT
 #define CHECK_STACK_LIMIT (-1)
@@ -2430,6 +2431,257 @@ unsigned const 
x86_64_ms_sysv_extra_clobbered_registers[12] =
   XMM12_REG, XMM13_REG, XMM14_REG, XMM15_REG
 };
 
+enum xlogue_stub {
+  XLOGUE_STUB_SAVE,
+  XLOGUE_STUB_RESTORE,
+  XLOGUE_STUB_RESTORE_TAIL,
+  XLOGUE_STUB_SAVE_HFP,
+  XLOGUE_STUB_RESTORE_HFP,
+  XLOGUE_STUB_RESTORE_HFP_TAIL,
+
+  XLOGUE_STUB_COUNT
+};
+
+enum xlogue_stub_sets {
+  XLOGUE_SET_ALIGNED,
+  XLOGUE_SET_ALIGNED_PLUS_8,
+  XLOGUE_SET_HFP_ALIGNED_OR_REALIGN,
+  XLOGUE_SET_HFP_ALIGNED_PLUS_8,
+
+  XLOGUE_SET_COUNT
+};
+
+/* Register save/restore layout used by an out-of-line stubs.  */
+class xlogue_layout {
+public:
+  struct reginfo
+  {
+unsigned regno;
+HOST_WIDE_INT offset;  /* Offset used by stub base pointer (rax or
+  rsi) to where each register is stored.  */
+  };
+
+  unsigned get_nregs () const  {return m_nregs;}
+  HOST_WIDE_INT get_stack_align_off_in () const{return 
m_stack_align_off_in;}
+
+  const reginfo &get_reginfo (unsigned reg) const
+  {
+gcc_assert (reg < m_nregs);
+return m_regs[reg];
+  }
+
+  /* Returns an rtx for the stub's symbol based upon
+   1.) the specified stub (save, restore or restore_ret) and
+   2.) the value of cfun->machine->outline_ms_sysv_extra_regs and
+   3.) rather or not stack alignment is being performed.  */
+  rtx get_stub_rtx (enum xlogue_stub stub) const;
+
+  /* Returns the amount of stack space (including padding) that the stub
+ needs to store registers based upon data in the machine_function.  */
+  HOST_WIDE_INT get_stack_space_used () const
+  {
+const struct machine_function &m = *cfun->machine;
+unsigned last_reg = m.outline_ms_sysv_extra_regs + MIN_REGS - 1;
+
+gcc_assert (m.outline_ms_sysv_extra_regs <= MAX_EXTRA_REGS);
+return m_regs[last_reg].offset
+   + (m.outline_ms_sysv_pad_out ? 8 : 0)
+   + STUB_INDEX_OFFSET;
+  }
+
+  /* Returns the offset for the base pointer used by the stub.  */
+  HOST_WIDE_INT get_stub_ptr_offset () const
+  {
+return STUB_INDEX_OFFSET + m_stack_align_off_in;
+  }
+
+  static const struct xlogue_layout &get_instance ();
+  static unsigned compute_stub_managed_regs (HARD_REG_SET &stub_managed_regs);
+
+  static const HOST_WIDE_INT STUB_INDEX_OFFSET = 0x70;
+  static const unsigned MIN_REGS = NUM_X86_64_MS_CLOBBERED_REGS;
+  static const unsigned MAX_REGS = 18;
+  static const unsigned MAX_EXTRA_REGS = MAX_REGS - MIN_REGS;
+  static const unsigned VARIANT_COUNT = MAX_EXTRA_REGS + 1;
+  static const unsigned STUB_NAME_MAX_LEN = 16;
+  static const char * const STUB_BASE_NAMES[XLOGUE_STUB_COUNT];
+  static const unsigned REG_ORDER[MAX_REGS];
+  static const unsigned REG_ORDER_REALIGN[MAX_REGS];
+
+private:
+  xlogue_layout ();
+  xlogue_layout (HOST_WIDE_INT stack_align_off_in, bool hfp);
+  xlogue_layout (const xlogue_layout &);
+
+  /* True if hard frame pointer is used.  */
+  bool m_hfp;
+
+  /* Max number of register this layout manages.  */
+  unsigned m_nregs;
+
+  /* Incoming offset from 16-byte alignment.  */
+  HOST_WIDE_INT m_stack_align_off_in;
+  struct reginfo m_regs[MAX_REGS];
+  rtx m_syms[XLOGUE_STUB_COUNT][VARIANT_COUNT];
+  char m_stub_names[XLOGUE_STUB_COUNT][VARIANT_COUNT][STUB_NAME_MAX_LEN];
+
+  static const struct xlogue_layout GTY(()) s_instances[XLOGUE_SET_COUNT];
+};
+
+const char * const xlogue_layout::STUB_BASE_NAMES[XLOGUE_STUB_COUNT] = {
+  "savms64",
+  "resms64",
+

[PATCH 1/8] [i386] Minor refactoring

2017-02-05 Thread Daniel Santos
For the sake of clarity, I've separated out these minor refactoring
changes from the rest of the patches.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 21 ++---
 gcc/config/i386/i386.h |  4 +++-
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index abc0136f78e..05974208a27 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2422,7 +2422,7 @@ static int const x86_64_int_return_registers[4] =
 
 /* Additional registers that are clobbered by SYSV calls.  */
 
-int const x86_64_ms_sysv_extra_clobbered_registers[12] =
+unsigned const x86_64_ms_sysv_extra_clobbered_registers[12] =
 {
   SI_REG, DI_REG,
   XMM6_REG, XMM7_REG,
@@ -12388,6 +12388,7 @@ ix86_builtin_setjmp_frame_value (void)
 static void
 ix86_compute_frame_layout (struct ix86_frame *frame)
 {
+  struct machine_function *m = cfun->machine;
   unsigned HOST_WIDE_INT stack_alignment_needed;
   HOST_WIDE_INT offset;
   unsigned HOST_WIDE_INT preferred_alignment;
@@ -12422,19 +12423,19 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
  scheduling that can be done, which means that there's very little point
  in doing anything except PUSHs.  */
   if (TARGET_SEH)
-cfun->machine->use_fast_prologue_epilogue = false;
+m->use_fast_prologue_epilogue = false;
 
   /* During reload iteration the amount of registers saved can change.
  Recompute the value as needed.  Do not recompute when amount of registers
  didn't change as reload does multiple calls to the function and does not
  expect the decision to change within single iteration.  */
   else if (!optimize_bb_for_size_p (ENTRY_BLOCK_PTR_FOR_FN (cfun))
-   && cfun->machine->use_fast_prologue_epilogue_nregs != frame->nregs)
+  && m->use_fast_prologue_epilogue_nregs != frame->nregs)
 {
   int count = frame->nregs;
   struct cgraph_node *node = cgraph_node::get (current_function_decl);
 
-  cfun->machine->use_fast_prologue_epilogue_nregs = count;
+  m->use_fast_prologue_epilogue_nregs = count;
 
   /* The fast prologue uses move instead of push to save registers.  This
  is significantly longer, but also executes faster as modern hardware
@@ -12451,14 +12452,14 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
   if (node->frequency < NODE_FREQUENCY_NORMAL
  || (flag_branch_probabilities
  && node->frequency < NODE_FREQUENCY_HOT))
-cfun->machine->use_fast_prologue_epilogue = false;
+   m->use_fast_prologue_epilogue = false;
   else
-cfun->machine->use_fast_prologue_epilogue
+   m->use_fast_prologue_epilogue
   = !expensive_function_p (count);
 }
 
   frame->save_regs_using_mov
-= (TARGET_PROLOGUE_USING_MOVE && cfun->machine->use_fast_prologue_epilogue
+= (TARGET_PROLOGUE_USING_MOVE && m->use_fast_prologue_epilogue
/* If static stack checking is enabled and done with probes,
  the registers need to be saved before allocating the frame.  */
&& flag_stack_check != STATIC_BUILTIN_STACK_CHECK);
@@ -28479,11 +28480,9 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
   else if (TARGET_64BIT_MS_ABI
   && (!callarg2 || INTVAL (callarg2) != -2))
 {
-  int const cregs_size
-   = ARRAY_SIZE (x86_64_ms_sysv_extra_clobbered_registers);
-  int i;
+  unsigned i;
 
-  for (i = 0; i < cregs_size; i++)
+  for (i = 0; i < NUM_X86_64_MS_CLOBBERED_REGS; i++)
{
  int regno = x86_64_ms_sysv_extra_clobbered_registers[i];
  machine_mode mode = SSE_REGNO_P (regno) ? TImode : DImode;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index a5cd8452424..ed7e4edec56 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2162,7 +2162,9 @@ extern int const dbx_register_map[FIRST_PSEUDO_REGISTER];
 extern int const dbx64_register_map[FIRST_PSEUDO_REGISTER];
 extern int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER];
 
-extern int const x86_64_ms_sysv_extra_clobbered_registers[12];
+extern unsigned const x86_64_ms_sysv_extra_clobbered_registers[12];
+#define NUM_X86_64_MS_CLOBBERED_REGS \
+  (ARRAY_SIZE (x86_64_ms_sysv_extra_clobbered_registers))
 
 /* Before the prologue, RA is at 0(%esp).  */
 #define INCOMING_RETURN_ADDR_RTX \
-- 
2.11.0



[PATCH 7/8] [i386] Add msabi pro/epilogue stubs to libgcc

2017-02-05 Thread Daniel Santos
Adds libgcc/config/i386/i386-asm.h to manage common cpp and gas macros. Adds
assembly stubs.  stubs use the following naming convention:

  (sav|res)ms64[f][x]

save|resSave or restore
ms64Avoid possible name collisions with future stubs
(specific to 64-bit msabi --> sysv scenario)
[f] Variant for hard frame pointer (and stack realignment)
[x] Tail-call variant (is the return from function)

Signed-off-by: Daniel Santos 
---
 libgcc/config.host |  2 +-
 libgcc/config/i386/i386-asm.h  | 82 ++
 libgcc/config/i386/resms64.S   | 57 +
 libgcc/config/i386/resms64f.S  | 55 
 libgcc/config/i386/resms64fx.S | 57 +
 libgcc/config/i386/resms64x.S  | 59 ++
 libgcc/config/i386/savms64.S   | 57 +
 libgcc/config/i386/savms64f.S  | 55 
 libgcc/config/i386/t-msabi |  7 
 9 files changed, 430 insertions(+), 1 deletion(-)
 create mode 100644 libgcc/config/i386/i386-asm.h
 create mode 100644 libgcc/config/i386/resms64.S
 create mode 100644 libgcc/config/i386/resms64f.S
 create mode 100644 libgcc/config/i386/resms64fx.S
 create mode 100644 libgcc/config/i386/resms64x.S
 create mode 100644 libgcc/config/i386/savms64.S
 create mode 100644 libgcc/config/i386/savms64f.S
 create mode 100644 libgcc/config/i386/t-msabi

diff --git a/libgcc/config.host b/libgcc/config.host
index 540bfa96358..6c497b13a27 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1339,7 +1339,7 @@ case ${host} in
 i[34567]86-*-linux* | x86_64-*-linux* | \
   i[34567]86-*-kfreebsd*-gnu | x86_64-*-kfreebsd*-gnu | \
   i[34567]86-*-gnu*)
-   tmake_file="${tmake_file} t-tls i386/t-linux t-slibgcc-libgcc"
+   tmake_file="${tmake_file} t-tls i386/t-linux i386/t-msabi 
t-slibgcc-libgcc"
if test "$libgcc_cv_cfi" = "yes"; then
tmake_file="${tmake_file} t-stack i386/t-stack-i386"
fi
diff --git a/libgcc/config/i386/i386-asm.h b/libgcc/config/i386/i386-asm.h
new file mode 100644
index 000..c613e9fd83d
--- /dev/null
+++ b/libgcc/config/i386/i386-asm.h
@@ -0,0 +1,82 @@
+/* Defines common perprocessor and assembly macros for use by various stubs.
+   Copyright (C) 2016-2017 Free Software Foundation, Inc.
+   Contributed by Daniel Santos 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+#ifndef I386_ASM_H
+#define I386_ASM_H
+
+#ifdef __ELF__
+# define ELFFN(fn) .type fn,@function
+#else
+# define ELFFN(fn)
+#endif
+
+#define FUNC_START(fn) \
+   .global fn; \
+   ELFFN (fn); \
+fn:
+
+#define HIDDEN_FUNC(fn)\
+   FUNC_START (fn) \
+   .hidden fn; \
+
+#define FUNC_END(fn) .size fn,.-fn
+
+#ifdef __SSE2__
+# ifdef __AVX__
+#  define MOVAPS vmovaps
+# else
+#  define MOVAPS movaps
+# endif
+
+/* Save SSE registers 6-15. off is the offset of rax to get to xmm6.  */
+.macro SSE_SAVE off=0
+   MOVAPS %xmm15,(\off - 0x90)(%rax)
+   MOVAPS %xmm14,(\off - 0x80)(%rax)
+   MOVAPS %xmm13,(\off - 0x70)(%rax)
+   MOVAPS %xmm12,(\off - 0x60)(%rax)
+   MOVAPS %xmm11,(\off - 0x50)(%rax)
+   MOVAPS %xmm10,(\off - 0x40)(%rax)
+   MOVAPS %xmm9, (\off - 0x30)(%rax)
+   MOVAPS %xmm8, (\off - 0x20)(%rax)
+   MOVAPS %xmm7, (\off - 0x10)(%rax)
+   MOVAPS %xmm6, \off(%rax)
+.endm
+
+/* Restore SSE registers 6-15. off is the offset of rsi to get to xmm6.  */
+.macro SSE_RESTORE off=0
+   MOVAPS (\off - 0x90)(%rsi), %xmm15
+   MOVAPS (\off - 0x80)(%rsi), %xmm14
+   MOVAPS (\off - 0x70)(%rsi), %xmm13
+   MOVAPS (\off - 0x60)(%rsi), %xmm12
+   MOVAPS (\off - 0x50)(%rsi), %xmm11
+   MOVAPS (\off - 0x40)(%rsi), %xmm10
+   MOVAPS (\off - 0x30)(%rsi), %xmm9
+   MOVAPS (\off - 0x20)(%rsi), %xmm8
+   MOVAPS (\off - 0x10)(%rsi), %xmm7
+   MOVAPS \off(%rsi), %xmm6
+.endm
+
+#endif /* __SSE2__ */
+#endif /* I386_ASM_H */
diff --git a/libgcc/config/i386/resms64.S b/libgcc/config/i386/resms64.S
new file mode 100644

[PATCH 5/8] [i386] Modify ix86_compute_frame_layout for foutline-msabi-xlogues

2017-02-05 Thread Daniel Santos
ix86_compute_frame_layout will now populate fields added to structs
machine_function and ix86_frame, which are used by xlogue_layout::get_instance
to determine the correct instance to return.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 105 +++--
 1 file changed, 101 insertions(+), 4 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 962f805c033..b3d48ac2e78 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2703,12 +2703,29 @@ struct GTY(()) stack_local_entry {
saved frame pointer if frame_pointer_needed
<- HARD_FRAME_POINTER
[saved regs]
-   <- regs_save_offset
+   <- reg_save_offset
[padding0]
<- stack_realign_offset
[saved SSE regs]
+   OR
+   [stub-saved registers for ms x64 --> sysv clobbers
+   <- Start of out-of-line, stub-saved/restored regs
+  (see libgcc/config/i386/(sav|res)ms64*.S)
+ [XMM6-15]
+ [RSI]
+ [RDI]
+ [?RBX]only if RBX is clobbered
+ [?RBP]only if RBP and RBX are clobbered
+ [?R12]only if R12 and all previous regs are clobbered
+ [?R13]only if R13 and all previous regs are clobbered
+ [?R14]only if R14 and all previous regs are clobbered
+ [?R15]only if R15 and all previous regs are clobbered
+   <- end of stub-saved/restored regs
+ [padding1]
+   ]
+   <- outlined_save_offset
<- sse_regs_save_offset
-   [padding1]  |
+   [padding2]
   |<- FRAME_POINTER
[va_arg registers]  |
   |
@@ -2733,6 +2750,7 @@ struct ix86_frame
   HOST_WIDE_INT reg_save_offset;
   HOST_WIDE_INT stack_realign_allocate_offset;
   HOST_WIDE_INT stack_realign_offset;
+  HOST_WIDE_INT outlined_save_offset;
   HOST_WIDE_INT sse_reg_save_offset;
 
   /* When save_regs_using_mov is set, emit prologue using
@@ -12638,6 +12656,22 @@ ix86_builtin_setjmp_frame_value (void)
   return stack_realign_fp ? hard_frame_pointer_rtx : virtual_stack_vars_rtx;
 }
 
+/* Disables out-of-lined msabi to sysv pro/epilogues and emits a warning if
+   warn_once is null, or *warn_once is zero.  */
+static void disable_outline_msabi_xlogues (int *warn_once, const char *msg)
+{
+  cfun->machine->outline_ms_sysv = false;
+  if (!warn_once || !*warn_once)
+{
+  warning (OPT_moutline_msabi_xlogues,
+  "Out-of-lining pro/epilogues for Microsoft ABI functions is "
+  "not currently compatible with %s%s.", msg,
+  !warn_once ? ", and is disabled for this function" : "");
+}
+if (warn_once)
+  *warn_once = 1;
+}
+
 /* When using -fsplit-stack, the allocation routines set a field in
the TCB to the bottom of the stack plus this much space, measured
in bytes.  */
@@ -12656,9 +12690,54 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
   HOST_WIDE_INT size = get_frame_size ();
   HOST_WIDE_INT to_allocate;
 
+  CLEAR_HARD_REG_SET (stub_managed_regs);
+
+  /* m->outline_ms_sysv is initially enabled in ix86_expand_call for all 64-bit
+   * ms_abi functions that call a sysv function.  We now need to prune away
+   * cases where it should be disabled.  */
+  if (TARGET_64BIT && m->outline_ms_sysv)
+  {
+static int warned_seh;
+
+gcc_assert (TARGET_64BIT_MS_ABI);
+gcc_assert (TARGET_OUTLINE_MSABI_XLOGUES);
+
+if (!TARGET_SSE)
+  m->outline_ms_sysv = false;
+
+/* Don't break hot-patched functions.  */
+else if (ix86_function_ms_hook_prologue (current_function_decl))
+  m->outline_ms_sysv = false;
+
+/* TODO: Cases not yet examined.  */
+else if (TARGET_SEH)
+  disable_outline_msabi_xlogues (&warned_seh,
+"Structured Exception Handling (SEH)");
+else if (crtl->calls_eh_return)
+  disable_outline_msabi_xlogues (NULL, "__builtin_eh_return");
+
+else if (ix86_static_chain_on_stack)
+  disable_outline_msabi_xlogues (NULL, "static call chains");
+
+else if (ix86_using_red_zone ())
+  disable_outline_msabi_xlogues (NULL, "red zones");
+
+else if (flag_split_stack)
+  disable_outline_msabi_xlogues (NULL, "split stack");
+
+/* Finally, compute which registers the stub will manage.  */
+else
+  {
+   unsigned count = xlogue_layout
+::compute_stub_managed_regs (stub_managed_regs);
+   m->outline_ms_sysv_extra_regs = count - xlogue_layout::MIN_REGS;
+  }
+  }
+
   frame->nregs = ix86_nsaved_regs ();
   frame->nsseregs = ix86_nsaved_sseregs ();
-  CLEAR_HARD_REG_SET (stub_managed_regs);
+  m->outline_ms_sysv_pad_in = 0;
+  m->outline_ms_sysv

[PATCH 2/8] [i386] Add option -moutline-msabi-xlogues

2017-02-05 Thread Daniel Santos
Adds the option to i386.opt and i386.c and adds documentation to
invoke.texi.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c   |  3 ++-
 gcc/config/i386/i386.opt |  5 +
 gcc/doc/invoke.texi  | 11 ++-
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 05974208a27..9a0dfdc77ba 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -4361,7 +4361,8 @@ ix86_target_string (HOST_WIDE_INT isa, HOST_WIDE_INT isa2,
 { "-mstv", MASK_STV },
 { "-mavx256-split-unaligned-load", MASK_AVX256_SPLIT_UNALIGNED_LOAD },
 { "-mavx256-split-unaligned-store",
MASK_AVX256_SPLIT_UNALIGNED_STORE },
-{ "-mprefer-avx128",   MASK_PREFER_AVX128 }
+{ "-mprefer-avx128",   MASK_PREFER_AVX128 },
+{ "-mmoutline-msabi-xlogues",  MASK_OUTLINE_MSABI_XLOGUES }
   };
 
   /* Additional flag options.  */
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 0ee31845eba..0ff93f831c0 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -538,6 +538,11 @@ Enum(calling_abi) String(sysv) Value(SYSV_ABI)
 EnumValue
 Enum(calling_abi) String(ms) Value(MS_ABI)
 
+moutline-msabi-xlogues
+Target Report Mask(OUTLINE_MSABI_XLOGUES) Save
+Reduces function size by using out-of-line stubs to save & restore registers
+clobberd by differences in Microsoft and System V ABIs.
+
 mveclibabi=
 Target RejectNegative Joined Var(ix86_veclibabi_type) Enum(ix86_veclibabi) 
Init(ix86_veclibabi_type_none)
 Vector library ABI to use.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4b13aeb7426..901abbf99d6 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1210,7 +1210,7 @@ See RS/6000 and PowerPC Options.
 -msse2avx  -mfentry  -mrecord-mcount  -mnop-mcount  -m8bit-idiv @gol
 -mavx256-split-unaligned-load  -mavx256-split-unaligned-store @gol
 -malign-data=@var{type}  -mstack-protector-guard=@var{guard} @gol
--mmitigate-rop  -mgeneral-regs-only}
+-mmitigate-rop  -mgeneral-regs-only  -moutline-msabi-xlogues}
 
 @emph{x86 Windows Options}
 @gccoptlist{-mconsole  -mcygwin  -mno-cygwin  -mdll @gol
@@ -25304,6 +25304,15 @@ You can control this behavior for specific functions by
 using the function attributes @code{ms_abi} and @code{sysv_abi}.
 @xref{Function Attributes}.
 
+@item -moutline-msabi-xlogues
+@opindex moutline-msabi-xlogues
+@opindex no-moutline-msabi-xlogues
+Due to differences in 64-bit ABIs, any Microsoft ABI function that calls a
+SysV ABI function must consider RSI, RDI and XMM6-15 as clobbered, emitting
+fairly lengthy prologues and epilogues.  This option generates prologues and
+epilogues that instead call stubs in libgcc to perform these saves & restores,
+thus reducing function size at the cost of and few extra instructions.
+
 @item -mtls-dialect=@var{type}
 @opindex mtls-dialect
 Generate code to access thread-local storage using the @samp{gnu} or
-- 
2.11.0



[PATCH 4/8] [i386] Modify ix86_save_reg to optionally omit stub-managed registers

2017-02-05 Thread Daniel Santos
Adds HARD_REG_SET stub_managed_regs to track registers that will be
managed by the pro/epilogue stubs for the function.

Adds a third parameter bool ignore_outlined to ix86_save_reg to specify
rather or not the count should include registers marked in
stub_managed_regs.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 31 ---
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 663a8c1b1ed..962f805c033 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12454,6 +12454,10 @@ ix86_hard_regno_scratch_ok (unsigned int regno)
  && df_regs_ever_live_p (regno)));
 }
 
+/* Registers who's save & restore will be managed by stubs called from
+   pro/epilogue.  */
+static HARD_REG_SET GTY(()) stub_managed_regs;
+
 /* Return true if register class CL should be an additional allocno
class.  */
 
@@ -12466,7 +12470,7 @@ ix86_additional_allocno_class_p (reg_class_t cl)
 /* Return TRUE if we need to save REGNO.  */
 
 static bool
-ix86_save_reg (unsigned int regno, bool maybe_eh_return)
+ix86_save_reg (unsigned int regno, bool maybe_eh_return, bool ignore_outlined)
 {
   /* If there are no caller-saved registers, we preserve all registers,
  except for MMX and x87 registers which aren't supported when saving
@@ -12534,6 +12538,10 @@ ix86_save_reg (unsigned int regno, bool 
maybe_eh_return)
}
 }
 
+  if (ignore_outlined && cfun->machine->outline_ms_sysv
+  && in_hard_reg_set_p (stub_managed_regs, DImode, regno))
+return false;
+
   if (crtl->drap_reg
   && regno == REGNO (crtl->drap_reg)
   && !cfun->machine->no_drap_save_restore)
@@ -12554,7 +12562,7 @@ ix86_nsaved_regs (void)
   int regno;
 
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true))
+if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true, true))
   nregs ++;
   return nregs;
 }
@@ -12570,7 +12578,7 @@ ix86_nsaved_sseregs (void)
   if (!TARGET_64BIT_MS_ABI)
 return 0;
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-if (SSE_REGNO_P (regno) && ix86_save_reg (regno, true))
+if (SSE_REGNO_P (regno) && ix86_save_reg (regno, true, true))
   nregs ++;
   return nregs;
 }
@@ -12650,6 +12658,7 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
 
   frame->nregs = ix86_nsaved_regs ();
   frame->nsseregs = ix86_nsaved_sseregs ();
+  CLEAR_HARD_REG_SET (stub_managed_regs);
 
   /* 64-bit MS ABI seem to require stack alignment to be always 16,
  except for function prologues, leaf functions and when the defult
@@ -13040,7 +13049,7 @@ ix86_emit_save_regs (void)
   rtx_insn *insn;
 
   for (regno = FIRST_PSEUDO_REGISTER - 1; regno-- > 0; )
-if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true))
+if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true, true))
   {
insn = emit_insn (gen_push (gen_rtx_REG (word_mode, regno)));
RTX_FRAME_RELATED_P (insn) = 1;
@@ -13130,7 +13139,7 @@ ix86_emit_save_regs_using_mov (HOST_WIDE_INT cfa_offset)
   unsigned int regno;
 
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true))
+if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true, true))
   {
 ix86_emit_save_reg_using_mov (word_mode, regno, cfa_offset);
cfa_offset -= UNITS_PER_WORD;
@@ -13145,7 +13154,7 @@ ix86_emit_save_sse_regs_using_mov (HOST_WIDE_INT 
cfa_offset)
   unsigned int regno;
 
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-if (SSE_REGNO_P (regno) && ix86_save_reg (regno, true))
+if (SSE_REGNO_P (regno) && ix86_save_reg (regno, true, true))
   {
ix86_emit_save_reg_using_mov (V4SFmode, regno, cfa_offset);
cfa_offset -= GET_MODE_SIZE (V4SFmode);
@@ -13529,13 +13538,13 @@ get_scratch_register_on_entry (struct scratch_reg *sr)
   && !static_chain_p
   && drap_regno != CX_REG)
regno = CX_REG;
-  else if (ix86_save_reg (BX_REG, true))
+  else if (ix86_save_reg (BX_REG, true, false))
regno = BX_REG;
   /* esi is the static chain register.  */
   else if (!(regparm == 3 && static_chain_p)
-  && ix86_save_reg (SI_REG, true))
+  && ix86_save_reg (SI_REG, true, false))
regno = SI_REG;
-  else if (ix86_save_reg (DI_REG, true))
+  else if (ix86_save_reg (DI_REG, true, false))
regno = DI_REG;
   else
{
@@ -14639,7 +14648,7 @@ ix86_emit_restore_regs_using_mov (HOST_WIDE_INT 
cfa_offset,
   unsigned int regno;
 
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, maybe_eh_return))
+if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, maybe_eh_return, 
true))
   {
rtx reg = gen_rtx_REG (word_mode, regno);
rtx mem;
@@ -14678,7 +14687,7 @@

[PATCH 8/8] [i386] Add remainder of moutline-msabi-xlogues implementation

2017-02-05 Thread Daniel Santos
Adds functions emit_msabi_outlined_save and emit_msabi_outlined_restore,
which are called from ix86_expand_prologue and ix86_expand_epilogue,
respectively.  Also adds the code to ix86_expand_call that enables the
optimization (setting the machine_function's outline_ms_sysv field).

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 281 +++--
 1 file changed, 272 insertions(+), 9 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b3d48ac2e78..f9a02bedbee 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -14115,6 +14115,79 @@ ix86_elim_entry_set_got (rtx reg)
 }
 }
 
+static rtx
+gen_frame_set (rtx reg, rtx frame_reg, int offset, bool store)
+{
+  rtx addr, mem;
+
+  if (offset)
+addr = gen_rtx_PLUS (Pmode, frame_reg, GEN_INT (offset));
+  mem = gen_frame_mem (GET_MODE (reg), offset ? addr : frame_reg);
+  return gen_rtx_SET (store ? mem : reg, store ? reg : mem);
+}
+
+static inline rtx
+gen_frame_load (rtx reg, rtx frame_reg, int offset)
+{
+  return gen_frame_set (reg, frame_reg, offset, false);
+}
+
+static inline rtx
+gen_frame_store (rtx reg, rtx frame_reg, int offset)
+{
+  return gen_frame_set (reg, frame_reg, offset, true);
+}
+
+static void
+ix86_emit_msabi_outlined_save (const struct ix86_frame &frame)
+{
+  struct machine_function *m = cfun->machine;
+  const unsigned ncregs = NUM_X86_64_MS_CLOBBERED_REGS
+ + m->outline_ms_sysv_extra_regs;
+  rtvec v = rtvec_alloc (ncregs - 1 + 3);
+  unsigned int align, i, vi = 0;
+  rtx_insn *insn;
+  rtx sym, addr;
+  rtx rax = gen_rtx_REG (word_mode, AX_REG);
+  const struct xlogue_layout &xlogue = xlogue_layout::get_instance ();
+  HOST_WIDE_INT rax_offset = xlogue.get_stub_ptr_offset () + m->fs.sp_offset;
+  HOST_WIDE_INT stack_alloc_size = frame.stack_pointer_offset - 
m->fs.sp_offset;
+  HOST_WIDE_INT stack_align_off_in = xlogue.get_stack_align_off_in ();
+
+  /* Verify that the incoming stack 16-byte alignment offset matches the
+ layout we're using.  */
+  gcc_assert (stack_align_off_in == (m->fs.sp_offset & UNITS_PER_WORD));
+
+  /* Get the stub symbol.  */
+  sym = xlogue.get_stub_rtx (frame_pointer_needed ? XLOGUE_STUB_SAVE_HFP
+ : XLOGUE_STUB_SAVE);
+  RTVEC_ELT (v, vi++) = gen_rtx_USE (VOIDmode, sym);
+  RTVEC_ELT (v, vi++) = const0_rtx;
+
+  /* Setup RAX as the stub's base pointer.  */
+  align = GET_MODE_ALIGNMENT (V4SFmode);
+  addr = choose_baseaddr (rax_offset, &align);
+  gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode));
+  insn = emit_insn (gen_rtx_SET (rax, addr));
+
+  gcc_assert (stack_alloc_size >= xlogue.get_stack_space_used ());
+  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
+GEN_INT (-stack_alloc_size), -1,
+m->fs.cfa_reg == stack_pointer_rtx);
+  for (i = 0; i < ncregs; ++i)
+{
+  const xlogue_layout::reginfo &r = xlogue.get_reginfo (i);
+  rtx reg = gen_rtx_REG ((SSE_REGNO_P (r.regno) ? V4SFmode : word_mode),
+r.regno);
+  RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);;
+}
+
+  gcc_assert (vi == (unsigned)GET_NUM_ELEM (v));
+
+  insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, v));
+  RTX_FRAME_RELATED_P (insn) = true;
+}
+
 /* Expand the prologue into a bunch of separate insns.  */
 
 void
@@ -14362,7 +14435,7 @@ ix86_expand_prologue (void)
 performing the actual alignment.  Otherwise we cannot guarantee
 that there's enough storage above the realignment point.  */
   allocate = frame.stack_realign_allocate_offset - m->fs.sp_offset;
-  if (allocate)
+  if (allocate && !m->outline_ms_sysv)
 pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
   GEN_INT (-allocate), -1, false);
 
@@ -14370,7 +14443,6 @@ ix86_expand_prologue (void)
   insn = emit_insn (ix86_gen_andsp (stack_pointer_rtx,
stack_pointer_rtx,
GEN_INT (-align_bytes)));
-
   /* For the purposes of register save area addressing, the stack
 pointer can no longer be used to access anything in the frame
 below m->fs.sp_realigned_offset and the frame pointer cannot be
@@ -14381,6 +14453,9 @@ ix86_expand_prologue (void)
   gcc_assert (m->fs.sp_realigned_offset == frame.stack_realign_offset);
 }
 
+  if (m->outline_ms_sysv)
+ix86_emit_msabi_outlined_save (frame);
+
   allocate = frame.stack_pointer_offset - m->fs.sp_offset;
 
   if (flag_stack_usage_info)
@@ -14701,17 +14776,19 @@ ix86_emit_restore_regs_using_pop (void)
   unsigned int regno;
 
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, false))
+if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, false, true))
   ix86_emit_restore_reg_usi