[PATCH try 2 resend] [i386] Remove warnings for ignoring -mcall-ms2sysv-xlogues.

2017-06-11 Thread Daniel Santos
I appear to have forgotten to cc gcc-patches, sorry about that.

There are currently three cases where we issue a warning when disabling
-mcall-ms2sysv-xlogues for a function, but I never added a proper
warning, so there's no mechanism for disabling it.  This is something
that I meant to address sooner.  I'm thinking that it's better to just
remove the warning entirely and document these cases, rather than adding
a new warning.  Any thoughts?

These are the conditions:

* the use of -fsplit-stack,
* the use of static call chains (not sure if we can ever have that), and
* if the function calls __buildin_eh_return.

Some of these cases can likely be supported, but they are just on the
"not yet tested" list.

2017-06-11  Daniel Santos  
---
 gcc/config/i386/i386.c | 26 +++---
 gcc/doc/invoke.texi| 25 -
 2 files changed, 23 insertions(+), 28 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d5c2d46bf5e..2dc6e53c765 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12772,18 +12772,6 @@ ix86_builtin_setjmp_frame_value (void)
   return stack_realign_fp ? hard_frame_pointer_rtx : virtual_stack_vars_rtx;
 }
 
-/* Emits a warning for unsupported msabi to sysv pro/epilogues.  */
-static void warn_once_call_ms2sysv_xlogues (const char *feature)
-{
-  static bool warned_once = false;
-  if (!warned_once)
-{
-  warning (0, "-mcall-ms2sysv-xlogues is not compatible with %s",
-  feature);
-  warned_once = true;
-}
-}
-
 /* When using -fsplit-stack, the allocation routines set a field in
the TCB to the bottom of the stack plus this much space, measured
in bytes.  */
@@ -12814,18 +12802,10 @@ ix86_compute_frame_layout (void)
   gcc_assert (TARGET_SSE);
   gcc_assert (!ix86_using_red_zone ());
 
-  if (crtl->calls_eh_return)
+  if (crtl->calls_eh_return || ix86_static_chain_on_stack)
{
  gcc_assert (!reload_completed);
  m->call_ms2sysv = false;
- warn_once_call_ms2sysv_xlogues ("__builtin_eh_return");
-   }
-
-  else if (ix86_static_chain_on_stack)
-   {
- gcc_assert (!reload_completed);
- m->call_ms2sysv = false;
- warn_once_call_ms2sysv_xlogues ("static call chains");
}
 
   /* Finally, compute which registers the stub will manage.  */
@@ -29290,9 +29270,9 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
  else if (ix86_function_ms_hook_prologue (current_function_decl))
;
 
- /* TODO: Cases not yet examined.  */
+ /* TODO: Compatibility not yet examined.  */
  else if (flag_split_stack)
-   warn_once_call_ms2sysv_xlogues ("-fsplit-stack");
+   ;
 
  else
{
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c1168823af7..eec02b43a4f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -25389,11 +25389,26 @@ using the function attributes @code{ms_abi} and 
@code{sysv_abi}.
 @opindex mno-call-ms2sysv-xlogues
 Due to differences in 64-bit ABIs, any Microsoft ABI function that calls a
 System V ABI function must consider RSI, RDI and XMM6-15 as clobbered.  By
-default, the code for saving and restoring these registers is emitted inline,
-resulting in fairly lengthy prologues and epilogues.  Using
-@option{-mcall-ms2sysv-xlogues} emits prologues and epilogues that
-use stubs in the static portion of libgcc to perform these saves and restores,
-thus reducing function size at the cost of a few extra instructions.
+default, the instructions for saving and restoring these registers are emitted
+inline, resulting in fairly lengthy pro- and epilogues.  Using
+@option{-mcall-ms2sysv-xlogues} emits pro- and epilogues that use stubs in the
+static portion of libgcc to perform these saves and restores, thus reducing
+function size at the cost of executing a few extra instructions.  This cost is
+theoretically mitigated or eliminated by reduced instruction cache utilization,
+temporal locality of the stubs, and the stubs' use of MOV instructions over
+PUSH and POP.
+
+This option is not supported with SEH, so it is completely unavailable on
+Windows.  It is also silently disabled if a function:
+
+@enumerate
+@item is built with @option{-mno-sse2} or @option{-fsplit-stack},
+@item has @code{__attribute__ ((ms_hook_prologue))}, or
+@item either throws an exception or explicitly calls 
@code{__builtin_eh_return}.
+@end enumerate
+
+Support for @option{-fsplit-stack} and @code{__builtin_eh_return} may be
+added at some time in the future, but has not yet been tested.
 
 @item -mtls-dialect=@var{type}
 @opindex mtls-dialect
-- 
2.11.0



Re: [PATCH v2 0/2] [testsuite, libgcc] PR80759 Fix FAILs on Solaris and Darwin

2017-07-01 Thread Daniel Santos
This patchset addresses a number of testsuite issues for 
gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp, mostly occurring on Solaris 
and Darwin.  Additionally, it solves a bug in libgcc that caused link 
failures on Darwin when building with -mcall-ms2sysv-xlogues.  The 
issues are detailed in the notes for each patch.


I would particularly appreciate any feedback for Darwin as I am 
unfamiliar with the platform and Rainer and I have fashioned some of 
these changes by looking at other Darwin code in gcc.


 .../gcc.target/x86_64/abi/ms-sysv/do-test.S  | 200 ---
 .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.c  |  83 +++-
 .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp| 153 +-
 libgcc/config.host   |   6 +-
 libgcc/config/i386/i386-asm.h|  89 +
 libgcc/config/i386/resms64.S |   2 +-
 libgcc/config/i386/resms64f.S|   2 +-
 libgcc/config/i386/resms64fx.S   |   2 +-
 libgcc/config/i386/resms64x.S|   2 +-
 libgcc/config/i386/savms64.S |   2 +-
 libgcc/config/i386/savms64f.S|   2 +-
 11 files changed, 274 insertions(+), 269 deletions(-)


Many thanks to Rainer for all of his help on this!

Thanks,
Daniel
2017-06-28  Daniel Santos  


2017-06-10  Daniel Santos  

PR testsuite/80759
* gcc.target/x86_64/abi/ms-sysv/do-test.S
(ELFFN_BEGIN): Rename to FN_TYPE.
(ELFFN_END): Rename to FN_SIZE.
(ASMNAME): New macro.
(FUNC): Rename to FUNC_BEGIN, use ASMNAME and use .globl instead of
.global.
(FUNC_END): Use ASMNAME.
(test_data_save): Remove.
(test_data_input): Likewise.
(test_data_output: Likewise.
(test_data_fn): Likewise.
(test_data_retaddr): Likewise.
(regs_to_mem): Make globals, use r10 instead of rax.
(mem_to_regs): Likewise.
(do_test_unaligned): Remove .cfi directives, remove pushf/popf, move
body to ms-sysv.c.
(do_test_aligned): Likewise.
* gcc.target/x86_64/abi/ms-sysv/ms-sysv.c:
Add dg-* directives.
(PASTE_STR): New macro.
(ASMNAME): Likewise.
(LOAD_TEST_DATA_ADDR): Likewise.
(TEST_DATA_OFFSET): Likewise.
(do_test_body0): New C function.
(do_test_body): New inline assembly routine.
* gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp
(runtest_ms_sysv): Modify.
2017-06-28  Daniel Santos  

PR testsuite/80759
* config.host: include i386/t-msabi for darwin and solaris.
* config/i386/i386-asm.h
(ELFFN): Rename to FN_TYPE.
(FN_SIZE): New macro.
(FN_HIDDEN): Likewise.
(ASMNAME): Likewise.
(FUNC_START): Rename to FUNC_BEGIN, use ASMNAME, replace .global with
.globl.
(HIDDEN_FUNC): Use ASMNAME and .globl instead of .global.
(SSE_SAVE): Convert to cpp macro, hard-code offset (always 0x60).
* config/i386/resms64.S: Use SSE_SAVE as cpp macro instead of gas
.macro.
* config/i386/resms64f.S: Likewise.
* config/i386/resms64fx.S: Likewise.
* config/i386/resms64x.S: Likewise.
* config/i386/savms64.S: Likewise.
* config/i386/savms64f.S: Likewise.


[PATCH 1/2] [testsuite] PR80759 fix tests on Solaris and Darwin

2017-07-01 Thread Daniel Santos
The ms-sysv.exp tests were failing on Solaris and Darwin targets.  In
addition, a number of other problems have been identified.

* Assembly failed on Solaris and Darwin when not using gas due to use of
  .cfi directives and .struct.

* Tests were failing on Solaris due to hard frame pointer being always
  enabled on that platform and and not passing --omit-rbp-clobbers to
  the code generator.

* Manual compilation (via remote_exec as opposed to dg-runtest, et. al.)
  was missing TEST_ALWAYS_FLAGS, resulting in color codes in log files.
  It was also missing -m64 in some cases where it was needed.

* When built with make -j48 on an unsupported triplet, the "test
  unsupported" message appeared 48 times in the log (it appears that
  several other tests do this as well).

* Using hard-coded offsets in do-tests.S is ugly.  This is fixed by
  moving some code into inline assembly in ms-sysv.c.

* Custom parallelization code broke when running make without -j

* Accessing the test_data global from assembly requires(?) use of global
  offset table on Darwin.

This patch corrects all of these problems.  The custom parallelization
code has been removed and replaced with calls to procs in gcc's standard
testing framework: gcc_parallel_test_enable, runtest_file_p and
dg-runtest.  This results in much poorer parallelization, which I hope
to address in a future patch, but has little effect when built without
checking enabled.

Previously, each test job compiled and executed around 20k individual
tests.  This high number resulted in test jobs far exceeding the default
5 minute timeout for remote_/local_exec when gcc was built with
--enable-checking=rtl.  This has been resolved by splitting the tests
out to a maximum of around 3500 tests per job.

Signed-off-by: Daniel Santos 
---
 .../gcc.target/x86_64/abi/ms-sysv/do-test.S| 200 +
 .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.c|  83 -
 .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp  | 153 +---
 3 files changed, 210 insertions(+), 226 deletions(-)

diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S 
b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S
index 1395235fd1e..ffe011bcc68 100644
--- a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S
+++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S
@@ -23,141 +23,101 @@ a copy of the GCC Runtime Library Exception along with 
this program;
 see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 <http://www.gnu.org/licenses/>.  */
 
-#ifdef __x86_64__
-
-# ifdef __ELF__
-#  define ELFFN_BEGIN(fn)   .type fn,@function
-#  define ELFFN_END(fn) .size fn,.-fn
-# else
-#  define ELFFN_BEGIN(fn)
-#  define ELFFN_END(fn)
-# endif
-
-# define FUNC(fn)  \
-   .global fn; \
-   ELFFN_BEGIN(fn);\
-fn:
-
-#define FUNC_END(fn) ELFFN_END(fn)
-
-# ifdef __AVX__
-#  define MOVAPS vmovaps
-# else
-#  define MOVAPS movaps
-# endif
-
-/* TODO: Is there a cleaner way to provide these offsets?  */
-   .struct 0
-test_data_save:
-
-   .struct test_data_save + 224
-test_data_input:
-
-   .struct test_data_save + 448
-test_data_output:
-
-   .struct test_data_save + 672
-test_data_fn:
-
-   .struct test_data_save + 680
-test_data_retaddr:
+#if defined(__x86_64__) && defined(__SSE2__)
+
+/* These macros currently support GNU/Linux, Solaris and Darwin.  */
+
+#ifdef __ELF__
+# define FN_TYPE(fn) .type fn,@function
+# define FN_SIZE(fn) .size fn,.-fn
+#else
+# define FN_TYPE(fn)
+# define FN_SIZE(fn)
+#endif
+
+#ifdef __USER_LABEL_PREFIX__
+# define ASMNAME2(prefix, name)prefix ## name
+# define ASMNAME1(prefix, name)ASMNAME2(prefix, name)
+# define ASMNAME(name) ASMNAME1(__USER_LABEL_PREFIX__, name)
+#else
+# define ASMNAME(name) name
+#endif
+
+#define FUNC_BEGIN(fn) \
+   .globl ASMNAME(fn); \
+   FN_TYPE (ASMNAME(fn));  \
+ASMNAME(fn):
+
+#define FUNC_END(fn) FN_SIZE(ASMNAME(fn))
+
+#ifdef __AVX__
+# define MOVAPS vmovaps
+#else
+# define MOVAPS movaps
+#endif
 
.text
 
-regs_to_mem:
-   MOVAPS  %xmm6, (%rax)
-   MOVAPS  %xmm7, 0x10(%rax)
-   MOVAPS  %xmm8, 0x20(%rax)
-   MOVAPS  %xmm9, 0x30(%rax)
-   MOVAPS  %xmm10, 0x40(%rax)
-   MOVAPS  %xmm11, 0x50(%rax)
-   MOVAPS  %xmm12, 0x60(%rax)
-   MOVAPS  %xmm13, 0x70(%rax)
-   MOVAPS  %xmm14, 0x80(%rax)
-   MOVAPS  %xmm15, 0x90(%rax)
-   mov %rsi, 0xa0(%rax)
-   mov %rdi, 0xa8(%rax)
-   mov %rbx, 0xb0(%rax)
-   mov %rbp, 0xb8(%rax)
-   mov %r12, 0xc0(%rax)
-   mov %r13, 0xc8(%rax)
-   mov %r14, 0xd0(%rax)
-   mov %r15, 0xd8(%rax)
+FUNC_BEGIN(regs_to_mem)
+   MOVAPS  %xmm6, (%r10)
+   MOVAPS  %xmm7, 0x10(%r10)
+   MOVAPS  %xmm8, 0x20(%r10)
+   MOVAPS  %xmm9, 0x30(%r10)
+   MOVAPS  %xmm10, 0x40(%r10)
+   MOVAPS  %xmm

[PATCH 2/2] [libgcc]: PR80759 fixes for Solaris & Darwin

2017-07-01 Thread Daniel Santos
The -mcall-ms2sysv-xlogues option is supposed to work on Solaris and
Darwin, but my changes to config.host weren't adding the sav/res stubs
to libgcc and the assembly code wasn't compatible with their assemblers
either.

* Change config.host to build -mcall-ms2sysv-xlogues sav/res stubs on
  Solaris and Darwin.
* Replace .macro/.endm with cpp macros
* Replace .global with .globl
* Append __USER_LABEL_PREFIX__ when defined (via ASMNAME macro).
* Only use .size when __ELF__ is defined.
* Only use .hidden when both __ELF__ and HAVE_GAS_HIDDEN are defined.

Signed-off-by: Daniel Santos 
---
 libgcc/config.host |  6 +--
 libgcc/config/i386/i386-asm.h  | 89 ++
 libgcc/config/i386/resms64.S   |  2 +-
 libgcc/config/i386/resms64f.S  |  2 +-
 libgcc/config/i386/resms64fx.S |  2 +-
 libgcc/config/i386/resms64x.S  |  2 +-
 libgcc/config/i386/savms64.S   |  2 +-
 libgcc/config/i386/savms64f.S  |  2 +-
 8 files changed, 64 insertions(+), 43 deletions(-)

diff --git a/libgcc/config.host b/libgcc/config.host
index cf62e0e54f7..bee3e931106 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -588,12 +588,12 @@ hppa*-*-openbsd*)
tmake_file="$tmake_file pa/t-openbsd"
;;
 i[34567]86-*-darwin*)
-   tmake_file="$tmake_file i386/t-crtpc t-crtfm"
+   tmake_file="$tmake_file i386/t-crtpc t-crtfm i386/t-msabi"
tm_file="$tm_file i386/darwin-lib.h"
extra_parts="$extra_parts crtprec32.o crtprec64.o crtprec80.o 
crtfastmath.o"
;;
 x86_64-*-darwin*)
-   tmake_file="$tmake_file i386/t-crtpc t-crtfm"
+   tmake_file="$tmake_file i386/t-crtpc t-crtfm i386/t-msabi"
tm_file="$tm_file i386/darwin-lib.h"
extra_parts="$extra_parts crtprec32.o crtprec64.o crtprec80.o 
crtfastmath.o"
;;
@@ -670,7 +670,7 @@ i[34567]86-*-rtems*)
extra_parts="$extra_parts crti.o crtn.o"
;;
 i[34567]86-*-solaris2* | x86_64-*-solaris2.1[0-9]*)
-   tmake_file="$tmake_file i386/t-crtpc t-crtfm"
+   tmake_file="$tmake_file i386/t-crtpc t-crtfm i386/t-msabi"
extra_parts="$extra_parts crtprec32.o crtprec64.o crtprec80.o 
crtfastmath.o"
tm_file="${tm_file} i386/elf-lib.h"
md_unwind_header=i386/sol2-unwind.h
diff --git a/libgcc/config/i386/i386-asm.h b/libgcc/config/i386/i386-asm.h
index c613e9fd83d..1387fd24b4f 100644
--- a/libgcc/config/i386/i386-asm.h
+++ b/libgcc/config/i386/i386-asm.h
@@ -26,22 +26,45 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #ifndef I386_ASM_H
 #define I386_ASM_H
 
+#include "auto-host.h"
+
+/* These macros currently support GNU/Linux, Solaris and Darwin.  */
+
 #ifdef __ELF__
-# define ELFFN(fn) .type fn,@function
+# define FN_TYPE(fn) .type fn,@function
+# define FN_SIZE(fn) .size fn,.-fn
+# ifdef HAVE_GAS_HIDDEN
+#  define FN_HIDDEN(fn) .hidden fn
+# endif
+#else
+# define FN_TYPE(fn)
+# define FN_SIZE(fn)
+#endif
+
+#ifndef FN_HIDDEN
+# define FN_HIDDEN(fn)
+#endif
+
+#ifdef __USER_LABEL_PREFIX__
+# define ASMNAME2(prefix, name)prefix ## name
+# define ASMNAME1(prefix, name)ASMNAME2(prefix, name)
+# define ASMNAME(name) ASMNAME1(__USER_LABEL_PREFIX__, name)
 #else
-# define ELFFN(fn)
+# define ASMNAME(name) name
 #endif
 
-#define FUNC_START(fn) \
-   .global fn; \
-   ELFFN (fn); \
-fn:
+#define FUNC_BEGIN(fn) \
+   .globl ASMNAME(fn); \
+   FN_TYPE (ASMNAME(fn));  \
+ASMNAME(fn):
 
-#define HIDDEN_FUNC(fn)\
-   FUNC_START (fn) \
-   .hidden fn; \
+#define HIDDEN_FUNC(fn)\
+   .globl ASMNAME(fn); \
+   FN_TYPE(ASMNAME(fn));   \
+   FN_HIDDEN(ASMNAME(fn)); \
+ASMNAME(fn):
 
-#define FUNC_END(fn) .size fn,.-fn
+#define FUNC_END(fn) FN_SIZE(ASMNAME(fn))
 
 #ifdef __SSE2__
 # ifdef __AVX__
@@ -51,32 +74,30 @@ fn:
 # endif
 
 /* Save SSE registers 6-15. off is the offset of rax to get to xmm6.  */
-.macro SSE_SAVE off=0
-   MOVAPS %xmm15,(\off - 0x90)(%rax)
-   MOVAPS %xmm14,(\off - 0x80)(%rax)
-   MOVAPS %xmm13,(\off - 0x70)(%rax)
-   MOVAPS %xmm12,(\off - 0x60)(%rax)
-   MOVAPS %xmm11,(\off - 0x50)(%rax)
-   MOVAPS %xmm10,(\off - 0x40)(%rax)
-   MOVAPS %xmm9, (\off - 0x30)(%rax)
-   MOVAPS %xmm8, (\off - 0x20)(%rax)
-   MOVAPS %xmm7, (\off - 0x10)(%rax)
-   MOVAPS %xmm6, \off(%rax)
-.endm
+#define SSE_SAVE  \
+   MOVAPS %xmm15,-0x30(%rax); \
+   MOVAPS %xmm14,-0x20(%rax); \
+   MOVAPS %xmm13,-0x10(%rax); \
+   MOVAPS %xmm12, (%rax); \
+   MOVAPS %xmm11, 0x10(%rax); \
+   MOVAPS %xmm10, 0x20(%rax); \
+   MOVAPS %xmm9,  0x30(%rax); \
+   MOVAPS %xmm8,  0x40(%rax); \
+   MOVAPS %xmm7,  0x50(%rax); \
+   MOVAPS %xmm6,  0x60(%rax)
 
 /* Restore SSE registers 6-

Re: [PATCH] Fix ms-sysv.exp testsuite FAILs (PR c/83117)

2017-11-27 Thread Daniel Santos
On 11/27/2017 04:34 PM, Jakub Jelinek wrote:
> Hi!
>
> As mentioned in the PR, my C FE rvalue folding patch allows folding
> const variable initializers into the uses of those variables in rvalue
> contexts more than before, and so we get warnings about UB in the test,
> because an unprototyped function is cast to a function type with ellipsis in
> it.
>
> It isn't entirely clear what exactly the test wants to test, as mentioned
> in the PR, this is one of the options how to solve it, by dropping the
> const it can't be optimized in the FEs (the optimizers can still figure out
> the static vars are never written to).  Another option would be just
> add -w to dg-options, another one is const volatile.
>
> Regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2017-11-27  Jakub Jelinek  
>
>   PR c/83117
>   * gcc.target/x86_64/abi/ms-sysv/gen.cc (make_do_tests_decl): Drop
>   const from do_test_{u,v}*.
>
> --- gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc.jj 2017-05-22 
> 10:49:45.0 +0200
> +++ gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc2017-11-27 
> 11:57:14.889570915 +0100
> @@ -392,7 +392,7 @@ static void make_do_tests_decl (const ve
>   continue;
>  
> comma.reset ();
> -   out << "static __attribute__ ((ms_abi)) long (*const do_test_"
> +   out << "static __attribute__ ((ms_abi)) long (*do_test_"
> << (unaligned ? "u" : "")
> << (varargs ? "v" : "") << i << ") (";
>  
>
>   Jakub
>

I don't have a problem with removing const, it's only there for
const-correctness and caution.  I just posted to the PR a bit ago and
I'm curious if there is a better approach when using assembly stubs that
are meant to be called in varying ways.  CV would work also, although
there's no real need to refetch the address before each use.

If you don't have a better way to do this then please use this patch.

Thanks!
Daniel





Re: [PATCH] Fix ms-sysv.exp testsuite FAILs (PR c/83117)

2017-11-28 Thread Daniel Santos


On 11/28/2017 05:22 AM, Jakub Jelinek wrote:
> On Mon, Nov 27, 2017 at 05:02:32PM -0600, Daniel Santos wrote:
>>> --- gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc.jj   2017-05-22 
>>> 10:49:45.0 +0200
>>> +++ gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc  2017-11-27 
>>> 11:57:14.889570915 +0100
>>> @@ -392,7 +392,7 @@ static void make_do_tests_decl (const ve
>>> continue;
>>>  
>>>   comma.reset ();
>>> - out << "static __attribute__ ((ms_abi)) long (*const do_test_"
>>> + out << "static __attribute__ ((ms_abi)) long (*do_test_"
>>>   << (unaligned ? "u" : "")
>>>   << (varargs ? "v" : "") << i << ") (";
>> I don't have a problem with removing const, it's only there for
>> const-correctness and caution.  I just posted to the PR a bit ago and
>> I'm curious if there is a better approach when using assembly stubs that
>> are meant to be called in varying ways.  CV would work also, although
>> there's no real need to refetch the address before each use.
>>
>> If you don't have a better way to do this then please use this patch.
> I've verified the resulting *.optimized dump as well as assembly is
> practically identical without/with the patch, only differences are in
> SSA_NAME versions, in assembly the .LC and .LCFI constants are
> different but otherwise it is the same - the functions are emitted in
> different orders by cgraph and committed the patch.
>
> Using assembly stubs that are meant to be called in varying ways should
> just be avoided in portable programs, you could e.g. in the generator
> instead of all those:
> extern __attribute__ ((ms_abi)) long do_test_aligned ();
> extern __attribute__ ((ms_abi)) long do_test_unaligned ();
> static __attribute__ ((ms_abi)) long (*do_test_1) (long a) = 
> (void*)do_test_aligned;
> static __attribute__ ((ms_abi)) long (*do_test_v1) (long a, ...) = 
> (void*)do_test_aligned;
> static __attribute__ ((ms_abi)) long (*do_test_u1) (long a) = 
> (void*)do_test_unaligned;
> static __attribute__ ((ms_abi)) long (*do_test_uv1) (long a, ...) = 
> (void*)do_test_unaligned;
> emit:
> extern __attribute__ ((ms_abi)) long do_test_1 (long a);
> asm (".text; do_test_1: jmp do_test_aligned; .previous");
> extern __attribute__ ((ms_abi)) long do_test_v1 (long a, ...);
> asm (".text; do_test_v1: jmp do_test_aligned; .previous");
> extern __attribute__ ((ms_abi)) long do_test_1 (long a);
> asm (".text; do_test_u1: jmp do_test_unaligned; .previous");
> extern __attribute__ ((ms_abi)) long do_test_1 (long a, ...);
> asm (".text; do_test_uv1: jmp do_test_unaligned; .previous");
> or something similar.
>
>   Jakub

Ah hah! That would indeed work. Thanks for the tip.  I have some
improvements to make to this set of tests, mostly tests triggered by
GCC_TEST_RUN_EXPENSIVE, but perhaps I can make this modification as
well.  Come to think of it, attribute naked might work too.

Thanks,
Daniel


[PATCH, x86, libgcc] PR target/83917 Correct debug for -mcall-ms2sysv-xlogues stubs

2018-01-19 Thread Daniel Santos
When stepping through tail-call restore stubs the debugger has to assume
that rsp - 8 is the CFA, although it is not.  This is because I did not
explicitly add any .cfi directives.  This patch adds them to the
tail-call restore stubs, but this is new territory for me, so I would
appreciate feedback.

I've reg-tested on x86_64, but I still need to test on Solaris and
Darwin.  OK to commit after those tests?

Thanks,
Daniel

Signed-off-by: Daniel Santos 
---
 libgcc/config/i386/resms64fx.h | 19 +++
 libgcc/config/i386/resms64x.h  | 22 ++
 2 files changed, 41 insertions(+)

diff --git a/libgcc/config/i386/resms64fx.h b/libgcc/config/i386/resms64fx.h
index c5f63d879fe..7dc8c7d89ed 100644
--- a/libgcc/config/i386/resms64fx.h
+++ b/libgcc/config/i386/resms64fx.h
@@ -34,21 +34,40 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 
.text
 MS2SYSV_STUB_BEGIN(resms64fx_17)
+.cfi_startproc
+.cfi_def_cfa %rbp, 16
mov -0x68(%rsi),%r15
+.cfi_endproc
 MS2SYSV_STUB_BEGIN(resms64fx_16)
+.cfi_startproc
+.cfi_def_cfa %rbp, 16
mov -0x60(%rsi),%r14
+.cfi_endproc
 MS2SYSV_STUB_BEGIN(resms64fx_15)
+.cfi_startproc
+.cfi_def_cfa %rbp, 16
mov -0x58(%rsi),%r13
+.cfi_endproc
 MS2SYSV_STUB_BEGIN(resms64fx_14)
+.cfi_startproc
+.cfi_def_cfa %rbp, 16
mov -0x50(%rsi),%r12
+.cfi_endproc
 MS2SYSV_STUB_BEGIN(resms64fx_13)
+.cfi_startproc
+.cfi_def_cfa %rbp, 16
mov -0x48(%rsi),%rbx
+.cfi_endproc
 MS2SYSV_STUB_BEGIN(resms64fx_12)
+.cfi_startproc
+.cfi_def_cfa %rbp, 16
mov -0x40(%rsi),%rdi
SSE_RESTORE
mov -0x38(%rsi),%rsi
leaveq
+.cfi_def_cfa %rsp, 8
ret
+.cfi_endproc
 MS2SYSV_STUB_END(resms64fx_12)
 MS2SYSV_STUB_END(resms64fx_13)
 MS2SYSV_STUB_END(resms64fx_14)
diff --git a/libgcc/config/i386/resms64x.h b/libgcc/config/i386/resms64x.h
index 1b44938ae7c..753be1f4c52 100644
--- a/libgcc/config/i386/resms64x.h
+++ b/libgcc/config/i386/resms64x.h
@@ -33,23 +33,45 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 
.text
 MS2SYSV_STUB_BEGIN(resms64x_18)
+.cfi_startproc
+.cfi_def_cfa %r10, 8
mov -0x70(%rsi),%r15
+.cfi_endproc
 MS2SYSV_STUB_BEGIN(resms64x_17)
+.cfi_startproc
+.cfi_def_cfa %r10, 8
mov -0x68(%rsi),%r14
+.cfi_endproc
 MS2SYSV_STUB_BEGIN(resms64x_16)
+.cfi_startproc
+.cfi_def_cfa %r10, 8
mov -0x60(%rsi),%r13
+.cfi_endproc
 MS2SYSV_STUB_BEGIN(resms64x_15)
+.cfi_startproc
+.cfi_def_cfa %r10, 8
mov -0x58(%rsi),%r12
+.cfi_endproc
 MS2SYSV_STUB_BEGIN(resms64x_14)
+.cfi_startproc
+.cfi_def_cfa %r10, 8
mov -0x50(%rsi),%rbp
+.cfi_endproc
 MS2SYSV_STUB_BEGIN(resms64x_13)
+.cfi_startproc
+.cfi_def_cfa %r10, 8
mov -0x48(%rsi),%rbx
+.cfi_endproc
 MS2SYSV_STUB_BEGIN(resms64x_12)
+.cfi_startproc
+.cfi_def_cfa %r10, 8
mov -0x40(%rsi),%rdi
SSE_RESTORE
mov -0x38(%rsi),%rsi
mov %r10,%rsp
+.cfi_def_cfa_register %rsp
ret
+.cfi_endproc
 MS2SYSV_STUB_END(resms64x_12)
 MS2SYSV_STUB_END(resms64x_13)
 MS2SYSV_STUB_END(resms64x_14)
-- 
2.15.0



Re: [PATCH, x86, libgcc] PR target/83917 Correct debug for -mcall-ms2sysv-xlogues stubs

2018-01-20 Thread Daniel Santos
On 01/19/2018 05:35 PM, Jakub Jelinek wrote:
> On Fri, Jan 19, 2018 at 05:33:10PM -0600, Daniel Santos wrote:
>> When stepping through tail-call restore stubs the debugger has to assume
>> that rsp - 8 is the CFA, although it is not.  This is because I did not
>> explicitly add any .cfi directives.  This patch adds them to the
>> tail-call restore stubs, but this is new territory for me, so I would
>> appreciate feedback.
>>
>> I've reg-tested on x86_64, but I still need to test on Solaris and
>> Darwin.  OK to commit after those tests?
> I think you can't assume that the assembler supports .cfi_* directives.
> While e.g. libgcc/config/i386/morestack.S uses them unconditionally,
> it is guarded with:
> if test "$libgcc_cv_cfi" = "yes"; then
> tmake_file="${tmake_file} t-stack i386/t-stack-i386"
> fi

Ah hah! That explains a lot.  Yeah, I wasn't thinking all assemblers
would support it but I saw them in the Solaris assembler manual and
figured that they were maybe more widely supported than I had thought.

> in config.host.  E.g. cygwin.S has:
> #ifdef HAVE_GAS_CFI_SECTIONS_DIRECTIVE
> .cfi_sections   .debug_frame
> # define cfi_startproc().cfi_startproc
> # define cfi_endproc()  .cfi_endproc
> # define cfi_adjust_cfa_offset(X)   .cfi_adjust_cfa_offset X
> # define cfi_def_cfa_register(X).cfi_def_cfa_register X
> # define cfi_register(D,S)  .cfi_register D, S
> # ifdef __x86_64__
> #  define cfi_push(X)   .cfi_adjust_cfa_offset 8; .cfi_rel_offset X, 0
> #  define cfi_pop(X).cfi_adjust_cfa_offset -8; .cfi_restore X
> # else
> #  define cfi_push(X)   .cfi_adjust_cfa_offset 4; .cfi_rel_offset X, 0
> #  define cfi_pop(X).cfi_adjust_cfa_offset -4; .cfi_restore X
> # endif
> #else
> # define cfi_startproc()
> # define cfi_endproc()
> # define cfi_adjust_cfa_offset(X)
> # define cfi_def_cfa_register(X)
> # define cfi_register(D,S)
> # define cfi_push(X)
> # define cfi_pop(X)
> #endif /* HAVE_GAS_CFI_SECTIONS_DIRECTIVE */
> perhaps you need something similar or commonize that (though, without
> .cfi_sections, you want the default).
>
>   Jakub

Thanks.  I like the idea of commonizing the macros for consistency.

As far as adding tests, I guess I would need to dig into
lib/gcc-gdb-test.exp to figure out how to do that.

Daniel


Re: [PATCH] Correct debug for -mcall-ms2sysv-xlogues stubs (PR target/83917, take 2)

2018-02-25 Thread Daniel Santos
Sorry for the dropping the ball on this and thank you Jakub for stepping in!

I've had a patch set sort-of rotting in my local repo, but I like yours
better.  I think I had gotten hung up on trying to figure out how to
write a test for this, and like you I just tested mine manually in gdb. 
I do have one correction though.


On 02/22/2018 08:56 AM, Jakub Jelinek wrote:
> Hi!
>
> On Sat, Jan 20, 2018 at 06:01:16PM -0600, Daniel Santos wrote:
>> Thanks.  I like the idea of commonizing the macros for consistency.
> Didn't see a progress on this P3 for a while, so I've written this
> version of the patch; no tests though, what I've been using in testing was:
> /* { dg-do compile { target lp64 } } */
> /* { dg-options "-mno-avx -msse2 -mcall-ms2sysv-xlogues -O2" } */
>
> void __attribute__((sysv_abi, noipa))
> foo (void)
> {
> }
>
> static void __attribute__((sysv_abi)) (*volatile foop) () = foo;
>
> void __attribute__((ms_abi, noipa))
> bar (void)
> {
>   foop ();
> }
>
> int
> main ()
> {
>   bar ();
>   return 0;
> }
>
> with/without -fno-omit-frame-pointer, disas bar; b on the tail
> call in there, stepi; bt (which before the patch failed, now works),
> also up; p $rbp to see if %rbp has been properly declared to be saved.
> There is no need to cfi_startproc/cfi_endproc for every single entrypoint in
> there, it is enough if the whole range is covered.  On the other side
> we need the cfi_offset for the frame pointer case, otherwise up; p/x $rbp
> doesn't work properly.
>
> Ok for trunk if it passes bootstrap/regtest on x86_64-linux and i686-linux?
>
> 2018-02-22  Jakub Jelinek  
>
>   PR debug/83917
>   * config/i386/i386-asm.h (PACKAGE_VERSION, PACKAGE_NAME,
>   PACKAGE_STRING, PACKAGE_TARNAME, PACKAGE_URL): Undefine between
>   inclusion of auto-target.h and auto-host.h.
>   (USE_GAS_CFI_DIRECTIVES): Define if not defined already based on
>   __GCC_HAVE_DWARF2_CFI_ASM.
>   (cfi_startproc, cfi_endproc, cfi_adjust_cfa_offset,
>   cfi_def_cfa_register, cfi_def_cfa, cfi_register, cfi_offset, cfi_push,
>   cfi_pop): Define.
>   * config/i386/cygwin.S: Don't include auto-host.h here, just
>   define USE_GAS_CFI_DIRECTIVES to 1 or 0 and include i386-asm.h.
>   (cfi_startproc, cfi_endproc, cfi_adjust_cfa_offset,
>   cfi_def_cfa_register, cfi_register, cfi_push, cfi_pop): Remove.
>   * config/i386/resms64fx.h: Add cfi_* directives.
>   * config/i386/resms64x.h: Likewise.
>
> --- libgcc/config/i386/i386-asm.h.jj  2018-01-03 10:42:56.317763517 +0100
> +++ libgcc/config/i386/i386-asm.h 2018-02-22 15:33:43.812922298 +0100
> @@ -27,8 +27,47 @@ see the files COPYING3 and COPYING.RUNTI
>  #define I386_ASM_H
>  
>  #include "auto-target.h"
> +#undef PACKAGE_VERSION
> +#undef PACKAGE_NAME
> +#undef PACKAGE_STRING
> +#undef PACKAGE_TARNAME
> +#undef PACKAGE_URL

This is a beautiful, temporary(?) fix to an ugly problem!

>  #include "auto-host.h"
>  
> +#ifndef USE_GAS_CFI_DIRECTIVES
> +# ifdef __GCC_HAVE_DWARF2_CFI_ASM
> +#  define USE_GAS_CFI_DIRECTIVES 1
> +# else
> +#  define USE_GAS_CFI_DIRECTIVES 0
> +# endif
> +#endif
> +#if USE_GAS_CFI_DIRECTIVES
> +# define cfi_startproc() .cfi_startproc
> +# define cfi_endproc()   .cfi_endproc
> +# define cfi_adjust_cfa_offset(X).cfi_adjust_cfa_offset X
> +# define cfi_def_cfa_register(X) .cfi_def_cfa_register X
> +# define cfi_def_cfa(R,O).cfi_def_cfa R, O
> +# define cfi_register(D,S)   .cfi_register D, S
> +# define cfi_offset(R,O) .cfi_offset R, O
> +# ifdef __x86_64__
> +#  define cfi_push(X).cfi_adjust_cfa_offset 8; 
> .cfi_rel_offset X, 0
> +#  define cfi_pop(X) .cfi_adjust_cfa_offset -8; .cfi_restore X
> +# else
> +#  define cfi_push(X).cfi_adjust_cfa_offset 4; 
> .cfi_rel_offset X, 0
> +#  define cfi_pop(X) .cfi_adjust_cfa_offset -4; .cfi_restore X
> +# endif
> +#else
> +# define cfi_startproc()
> +# define cfi_endproc()
> +# define cfi_adjust_cfa_offset(X)
> +# define cfi_def_cfa_register(X)
> +# define cfi_def_cfa(R,O)
> +# define cfi_register(D,S)
> +# define cfi_offset(R,O)
> +# define cfi_push(X)
> +# define cfi_pop(X)
> +#endif
> +
>  #define PASTE2(a, b) PASTE2a(a, b)
>  #define PASTE2a(a, b) a ## b
>  
> --- libgcc/config/i386/cygwin.S.jj2018-01-03 10:42:56.309763515 +0100
> +++ libgcc/config/i386/cygwin.S   2018-02-22 15:30:34.597925496 +0100
> @@ -23,31 +23,13 @@
>   * <http://www.gnu.org/licenses/>.
>   */
>  
> -#include "auto-host.h"

The following

Re: [PATCH] Correct debug for -mcall-ms2sysv-xlogues stubs (PR target/83917, take 2)

2018-02-26 Thread Daniel Santos


On 02/26/2018 02:20 AM, Jakub Jelinek wrote:
> On Sun, Feb 25, 2018 at 05:56:28PM -0600, Daniel Santos wrote:
>>> --- libgcc/config/i386/i386-asm.h.jj2018-01-03 10:42:56.317763517 
>>> +0100
>>> +++ libgcc/config/i386/i386-asm.h   2018-02-22 15:33:43.812922298 +0100
>>> @@ -27,8 +27,47 @@ see the files COPYING3 and COPYING.RUNTI
>>>  #define I386_ASM_H
>>>  
>>>  #include "auto-target.h"
>>> +#undef PACKAGE_VERSION
>>> +#undef PACKAGE_NAME
>>> +#undef PACKAGE_STRING
>>> +#undef PACKAGE_TARNAME
>>> +#undef PACKAGE_URL
>> This is a beautiful, temporary(?) fix to an ugly problem!
>>
>>>  #include "auto-host.h"
>>> --- libgcc/config/i386/cygwin.S.jj  2018-01-03 10:42:56.309763515 +0100
>>> +++ libgcc/config/i386/cygwin.S 2018-02-22 15:30:34.597925496 +0100
>>> @@ -23,31 +23,13 @@
>>>   * <http://www.gnu.org/licenses/>.
>>>   */
>>>  
>>> -#include "auto-host.h"
>> The following include should be here.
>>
>> +#include "i386-asm.h"
> I don't understand this.  i386-asm.h needs (both before my patch and after
> it) both auto-host.h and auto-target.h, as it tests
> HAVE_GAS_SECTIONS_DIRECTIVE (this one newly, comes from cygwin.S)

The problem is that HAVE_GAS_SECTIONS_DIRECTIVE gets defined (or not) in
../../gcc/auto-host.h, but you are testing it before including
auto-host.h, either directly or via i386-asm.h.  So if i386-asm.h
depends upon HAVE_GAS_SECTIONS_DIRECTIVE first being defined then it is
a circular dependency.

In its current form, cygwin.S would never define USE_GAS_CFI_DIRECTIVES
prior to including i386-asm.h and also never emit
    .cfi_sections    .debug_frame
and rather or not USE_GAS_CFI_DIRECTIVES ends up being defined to 1 or 0
depends upon the test of __GCC_HAVE_DWARF2_CFI_ASM in i386-asm.h.

So this area is new for me, but I don't understand why we're testing
HAVE_GAS_SECTIONS_DIRECTIVE in cygwin.S and __GCC_HAVE_DWARF2_CFI_ASM
when included from one of the stubs.  Is this an error, or a lack of my
understanding or both? :)

> HAVE_GAS_HIDDEN
> macros defined in auto-host.h
> and
> HAVE_AS_AVX
> macro defined in auto-target.h.
> Including auto-host.h when i386-asm.h will include it again just doesn't
> work, these headers don't have multiple inclusion guards.  And only including
> auto-target.h will work only if the
> .hidden
> and
> .cfi_sections .debug_frame
> tests are duplicated from gcc/configure.ac to libgcc/configure.ac, then we
> could include just auto-target.h in i386-asm.h.
> I've just followed what i386-asm.h has been doing.

And it's possible that I failed to test something correctly before
presuming it to be available, although I *think* the test for .hidden is
good.

>
>   Jakub
>

Thanks for your work on this.  If we need to test for CFI directives
differently when being included from cygwin.S, maybe we can just define
a simple cpp macro to indicate this and let i386-asm.h encapsulate the
implementation of it (e.g., testing HAVE_GAS_SECTIONS_DIRECTIVE or
__GCC_HAVE_DWARF2_CFI_ASM as appropriate).

Ultimately, the proper cleanup will be moving these tests out of
{gcc,libgcc}/configure.ac and into .m4 files in the root config
directory so that we don't uglify them with massive copy & pastes. 
These tests are also fairly complex as there are a lot of dependencies. 
m4 isn't my strong suite, but I can look at this after we're out of code
freeze.

Daniel



Re: [PATCH] Fix the GNU Stack markings on libgcc.a

2018-05-02 Thread Daniel Santos
Hello

On 05/01/2018 06:32 AM, Magnus Granberg wrote:
> New patch
> libgcc/ChangeLog:
>
> 2018-05-01  Magnus Granberg  
>
>   * config/i386/resms64.h: Add .note.GNU-stack section
>   * config/i386/resms64f.h: Likewise.
>   * config/i386/resms64fx.h: Likewise.
>   * config/i386/resms64x.h: Likewise.
>   * config/i386/savms64.h: Likewise.
>   * config/i386/savms64f.h: Likewise.
>
> ---

Well this isn't correct either because you are outside of the inclusion
guard.  Can you please move this up a line?

Thanks,
Daniel


Re: [PATCH] Fix the GNU Stack markings on libgcc.a

2018-05-02 Thread Daniel Santos

On 05/02/2018 06:17 PM, Magnus Granberg wrote:
> torsdag 3 maj 2018 kl. 01:07:51 CEST skrev  Daniel Santos:
>> Hello
>>
>> On 05/01/2018 06:32 AM, Magnus Granberg wrote:
>>> New patch
>>> libgcc/ChangeLog:
>>>
>>> 2018-05-01  Magnus Granberg  
>>>
>>> * config/i386/resms64.h: Add .note.GNU-stack section
>>> * config/i386/resms64f.h: Likewise.
>>> * config/i386/resms64fx.h: Likewise.
>>> * config/i386/resms64x.h: Likewise.
>>> * config/i386/savms64.h: Likewise.
>>> * config/i386/savms64f.h: Likewise.
>>>
>>> ---
>> Well this isn't correct either because you are outside of the inclusion
>> guard.  Can you please move this up a line?
>>
>> Thanks,
>> Daniel
> /libgcc/ChangeLog:
> 2018-05-01  Magnus Granberg  
>
>   * config/i386/resms64.h: Add .note.GNU-stack section
>   * config/i386/resms64f.h: Likewise.
>   * config/i386/resms64fx.h: Likewise.
>   * config/i386/resms64x.h: Likewise.
>   * config/i386/savms64.h: Likewise.
>   * config/i386/savms64f.h: Likewise.
>
> ---

No, I meant to move the changes up a line so that, if for some reason
the header was included twice, that it wouldn't output the section
twice.  Example:

 MS2SYSV_STUB_END(savms64_18) 
  
+#if·defined(__linux__)·&&·defined(__ELF__) 
+.section·.note.GNU-stack,"",%progbits 
+#endif
 #endif·/*·__x86_64__·*/ 


But upon further reflection, I think it can be cleanly added to
i386-asm.h.  Does that look sane Jakub?  (I haven't tried it)

Also, for the sake of my education, I don't exactly understand what the
problem is as I haven't been keeping up with pax and hardening.  I just
want to clarify that the stack shouldn't be executable.  These are not
actual "functions" per-se (i.e., they do not adhere to any ABI), they
operate on the stack of the calling function.

Thanks,
Daniel


diff --git a/libgcc/config/i386/i386-asm.h b/libgcc/config/i386/i386-asm.h
index 267133a9b75..7eb3c12fc85 100644
--- a/libgcc/config/i386/i386-asm.h
+++ b/libgcc/config/i386/i386-asm.h
@@ -80,6 +80,10 @@ ASMNAME(fn):
 
 #ifdef MS2SYSV_STUB_PREFIX
 
+# if·defined(__linux__)·&&·defined(__ELF__)
+.section·.note.GNU-stack,"",%progbits
+# endif
+
 # define MS2SYSV_STUB_BEGIN(base_name) \
 	HIDDEN_FUNC(PASTE2(MS2SYSV_STUB_PREFIX, base_name))
 


[PATCH] [testsuite/i386] PR 82268 Correct FAIL when configured --with-cpu

2017-10-27 Thread Daniel Santos
When I originally wrote this test I wasn't wasn't aware of the
--with-cpu configure option, so this change explicitly disables avx to
make sure we choose the sse implementation, even when --with-cpu
specifies an arch that has avx support.

OK for head?

gcc/testsuite/ChangeLog:

gcc.target/i386/pr82196-1.c (dg-options): Add -mno-avx.

Thanks,
Daniel

---
 gcc/testsuite/gcc.target/i386/pr82196-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr82196-1.c 
b/gcc/testsuite/gcc.target/i386/pr82196-1.c
index 541d975480d..ff108132bb5 100644
--- a/gcc/testsuite/gcc.target/i386/pr82196-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr82196-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target lp64 } } */
-/* { dg-options "-msse -mcall-ms2sysv-xlogues -O2" } */
+/* { dg-options "-mno-avx -msse -mcall-ms2sysv-xlogues -O2" } */
 /* { dg-final { scan-assembler "call.*__sse_savms64f?_12" } } */
 /* { dg-final { scan-assembler "jmp.*__sse_resms64f?x_12" } } */
 
-- 
2.14.3



[PATCH 0/2] [i386] PR82002 Correct ICE with large stack frame

2017-10-30 Thread Daniel Santos
I originally intended to submit the first part of this patch set a few
weeks ago as it was simpler, but here is the full fix.  The first part
is a really simple follow-up fix to an off-by-one error H.J. originally
fixed with r252099, but in the process of testing I discovered a more
complex problem when we add a ms_abi to sysv_abi call that resulted in a
bad INSN because I didn't check for a non-immediate offset.

I originally wrote a different solution where I added a mechanism to
struct ix86_frame to track and reuse a scratch register in the
pro/epilogue, but then I realized that I didn't need that if I just
emitted the SSE saves or stub call after the SP realignment and prior to
allocating the remainder of the frame.  However, I still need to use a
scratch register sometimes in the epilogue, so I've added a simplified
mechanism to choose_baseaddr to manage that, but not to track and reuse
it for subsequent calls.

Unfortunately, this sat for so long that there's two duplicates in
Bugzilla now (pr82485 and pr82712). Regression tested with {,-m32} and
I've started one for x32 even though it *shouldn't* affect it (in theory).

Thanks,
Daniel



[PATCH 1/2] [i386] PR82002 Part 1: Correct ICE caused by wrong calculation.

2017-10-30 Thread Daniel Santos
This is a residual problem caused by the off-by-one error in sp_valid_at
and fp_valid_at originally corrected in r252099.  However, adding tests
that include an ms_abi to sysv_abi call reveals an additional, more
complex problem with an invalid INSN due to overflowing the s32 offset.
Therefore I'm including all new tests, but marking ones that are broken
by this additional problem as xfail and addressing that in the next
patch.

gcc:
config/i386/i386.c (ix86_expand_epilogue): Correct stack
calculation.

gcc/testsuite:
gcc.target/i386/pr82002-1.c: New test.
gcc.target/i386/pr82002-2a.c: New xfail test.
gcc.target/i386/pr82002-2b.c: New xfail test.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr82002-1.c  | 12 
 gcc/testsuite/gcc.target/i386/pr82002-2a.c | 14 ++
 gcc/testsuite/gcc.target/i386/pr82002-2b.c | 14 ++
 4 files changed, 41 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr82002-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr82002-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr82002-2b.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2de0dd0c283..83a07afb3e1 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13812,7 +13812,7 @@ ix86_expand_epilogue (int style)
 the stack pointer, if we will restore SSE regs via sp.  */
   if (TARGET_64BIT
  && m->fs.sp_offset > 0x7fff
- && sp_valid_at (frame.stack_realign_offset)
+ && sp_valid_at (frame.stack_realign_offset + 1)
  && (frame.nsseregs + frame.nregs) != 0)
{
  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
diff --git a/gcc/testsuite/gcc.target/i386/pr82002-1.c 
b/gcc/testsuite/gcc.target/i386/pr82002-1.c
new file mode 100644
index 000..86678a01992
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr82002-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-Ofast -mstackrealign -mabi=ms" } */
+
+void a (char *);
+void
+b ()
+{
+  char c[100];
+  c[1099511627776] = 'b';
+  a (c);
+  a (c);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr82002-2a.c 
b/gcc/testsuite/gcc.target/i386/pr82002-2a.c
new file mode 100644
index 000..bc85080ba8e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr82002-2a.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-Ofast -mstackrealign -mabi=ms" } */
+/* { dg-xfail-if "" { *-*-* }  } */
+/* { dg-xfail-run-if "" { *-*-* }  } */
+
+void __attribute__((sysv_abi)) a (char *);
+void
+b ()
+{
+  char c[100];
+  c[1099511627776] = 'b';
+  a (c);
+  a (c);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr82002-2b.c 
b/gcc/testsuite/gcc.target/i386/pr82002-2b.c
new file mode 100644
index 000..10e44cd7b1d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr82002-2b.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-Ofast -mstackrealign -mabi=ms -mcall-ms2sysv-xlogues" } */
+/* { dg-xfail-if "" { *-*-* }  } */
+/* { dg-xfail-run-if "" { *-*-* }  } */
+
+void __attribute__((sysv_abi)) a (char *);
+void
+b ()
+{
+  char c[100];
+  c[1099511627776] = 'b';
+  a (c);
+  a (c);
+}
-- 
2.14.3



[PATCH 2/2] [i386] PR82002 Part 2: Correct non-immediate offset/invalid INSN

2017-10-30 Thread Daniel Santos
When we are realigning the stack pointer, making an ms_abi to sysv_abi
call and alllocating 2GiB or more on the stack we end up with an invalid
INSN due to a non-immediate offset.  This occurs both with and without
-mcall-ms2sysv-xlogues.  Additionally, I've discovered that the stack
allocation with -mcall-ms2sysv-xlogues is incorrect as it ignores stack
checking, stack clash checking and probing.

This patch fixes these problems by

1. No longer allocate stack space in ix86_emit_outlined_ms2sysv_save.
2. Rearrange where we emit SSE saves or stub call:
   a. Before frame allocation when offset from frame to save area is >= 2GiB.
   b. After frame allocation when frame is < 2GiB.  (Stack allocations
  prior to the stub call can't be combined with those afterwards, so
  this is better when possible.)
3. Modify choose_baseaddr to take an optional scratch_regno argument
   and never return rtx that cannot be used as an immediate.

gcc:
config/i386/i386.c (choose_basereg): Use optional scratch
register and add assertion.
(x86_emit_outlined_ms2sysv_save): use scratch register when
needed, and don't allocate stack.
(ix86_expand_prologue): Rearrange where SSE saves/stub call is
emitted, correct wrong allocation with -mcall-ms2sysv-xlogues.
(ix86_emit_outlined_ms2sysv_restore): Fix non-immediate offsets.

gcc/testsuite:
gcc.target/i386/pr82002-2a.c: Change from xfail to fail.
gcc.target/i386/pr82002-2b.c: Likewise.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 76 --
 gcc/testsuite/gcc.target/i386/pr82002-2a.c |  2 -
 gcc/testsuite/gcc.target/i386/pr82002-2b.c |  2 -
 3 files changed, 62 insertions(+), 18 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 83a07afb3e1..abd8e937e0d 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -11520,7 +11520,8 @@ choose_basereg (HOST_WIDE_INT cfa_offset, rtx &base_reg,
The valid base registers are taken from CFUN->MACHINE->FS.  */
 
 static rtx
-choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align)
+choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align,
+int scratch_regno = -1)
 {
   rtx base_reg = NULL;
   HOST_WIDE_INT base_offset = 0;
@@ -11534,6 +11535,28 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned 
int *align)
 choose_basereg (cfa_offset, base_reg, base_offset, 0, align);
 
   gcc_assert (base_reg != NULL);
+
+  if (TARGET_64BIT)
+{
+  rtx base_offset_rtx = GEN_INT (base_offset);
+
+  if (scratch_regno >= 0)
+   {
+ if (!x86_64_immediate_operand (base_offset_rtx, DImode))
+   {
+ rtx tmp;
+ rtx scratch_reg = gen_rtx_REG (DImode, scratch_regno);
+
+ emit_insn (gen_rtx_SET (scratch_reg, base_offset_rtx));
+ tmp = gen_rtx_PLUS (DImode, scratch_reg, base_reg);
+ emit_insn (gen_rtx_SET (scratch_reg, tmp));
+ return scratch_reg;
+   }
+   }
+  else
+   gcc_assert (x86_64_immediate_operand (base_offset_rtx, DImode));
+}
+
   return plus_constant (Pmode, base_reg, base_offset);
 }
 
@@ -12793,23 +12816,22 @@ ix86_emit_outlined_ms2sysv_save (const struct 
ix86_frame &frame)
   rtx sym, addr;
   rtx rax = gen_rtx_REG (word_mode, AX_REG);
   const struct xlogue_layout &xlogue = xlogue_layout::get_instance ();
-  HOST_WIDE_INT allocate = frame.stack_pointer_offset - m->fs.sp_offset;
 
   /* AL should only be live with sysv_abi.  */
   gcc_assert (!ix86_eax_live_at_start_p ());
+  gcc_assert (m->fs.sp_offset >= frame.sse_reg_save_offset);
 
   /* Setup RAX as the stub's base pointer.  We use stack_realign_offset rather
  we've actually realigned the stack or not.  */
   align = GET_MODE_ALIGNMENT (V4SFmode);
   addr = choose_baseaddr (frame.stack_realign_offset
- + xlogue.get_stub_ptr_offset (), &align);
+ + xlogue.get_stub_ptr_offset (), &align, AX_REG);
   gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode));
-  emit_insn (gen_rtx_SET (rax, addr));
 
-  /* Allocate stack if not already done.  */
-  if (allocate > 0)
-  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
-   GEN_INT (-allocate), -1, false);
+  /* If choose_baseaddr returned our scratch register, then we don't need to
+ do another SET.  */
+  if (!REG_P (addr) || REGNO (addr) != AX_REG)
+emit_insn (gen_rtx_SET (rax, addr));
 
   /* Get the stub symbol.  */
   sym = xlogue.get_stub_rtx (frame_pointer_needed ? XLOGUE_STUB_SAVE_HFP
@@ -12841,6 +12863,7 @@ ix86_expand_prologue (void)
   HOST_WIDE_INT allocate;
   bool int_registers_saved;
   bool sse_registers_saved;
+  bool save_stub_call_needed;
   rtx static_chain = NULL_RTX;
 
   if (ix86_function_n

Re: [PATCH 2/2] [i386] PR82002 Part 2: Correct non-immediate offset/invalid INSN

2017-10-30 Thread Daniel Santos
On 10/30/2017 09:09 PM, Daniel Santos wrote:
> 3. Modify choose_baseaddr to take an optional scratch_regno argument
>and never return rtx that cannot be used as an immediate.

I should amend this, it actually does a gcc_assert, so that won't happen
if --enable-checking=no, but it would still fail later in expand.

>  static rtx
> -choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align)
> +choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align,
> +  int scratch_regno = -1)
>  {
>rtx base_reg = NULL;
>HOST_WIDE_INT base_offset = 0;
> @@ -11534,6 +11535,28 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned 
> int *align)
>  choose_basereg (cfa_offset, base_reg, base_offset, 0, align);
>  
>gcc_assert (base_reg != NULL);
> +
> +  if (TARGET_64BIT)
> +{
> +  rtx base_offset_rtx = GEN_INT (base_offset);
> +
> +  if (scratch_regno >= 0)
> + {
> +   if (!x86_64_immediate_operand (base_offset_rtx, DImode))
> + {
> +   rtx tmp;
> +   rtx scratch_reg = gen_rtx_REG (DImode, scratch_regno);
> +
> +   emit_insn (gen_rtx_SET (scratch_reg, base_offset_rtx));
> +   tmp = gen_rtx_PLUS (DImode, scratch_reg, base_reg);
> +   emit_insn (gen_rtx_SET (scratch_reg, tmp));
> +   return scratch_reg;
> + }
> + }
> +  else
> + gcc_assert (x86_64_immediate_operand (base_offset_rtx, DImode));
> +}
> +
>return plus_constant (Pmode, base_reg, base_offset);
>  }

Daniel


Re: [PATCH 2/2] [i386] PR82002 Part 2: Correct non-immediate offset/invalid INSN

2017-11-02 Thread Daniel Santos
On 10/31/2017 04:31 AM, Uros Bizjak wrote:
> On Tue, Oct 31, 2017 at 3:09 AM, Daniel Santos  
> wrote:
>> When we are realigning the stack pointer, making an ms_abi to sysv_abi
>> call and alllocating 2GiB or more on the stack we end up with an invalid
>> INSN due to a non-immediate offset.  This occurs both with and without
>> -mcall-ms2sysv-xlogues.  Additionally, I've discovered that the stack
>> allocation with -mcall-ms2sysv-xlogues is incorrect as it ignores stack
>> checking, stack clash checking and probing.
>>
>> This patch fixes these problems by
>>
>> 1. No longer allocate stack space in ix86_emit_outlined_ms2sysv_save.
>> 2. Rearrange where we emit SSE saves or stub call:
>>a. Before frame allocation when offset from frame to save area is >= 2GiB.
>>b. After frame allocation when frame is < 2GiB.  (Stack allocations
>>   prior to the stub call can't be combined with those afterwards, so
>>   this is better when possible.)
>> 3. Modify choose_baseaddr to take an optional scratch_regno argument
>>and never return rtx that cannot be used as an immediate.
>>
>> gcc:
>> config/i386/i386.c (choose_basereg): Use optional scratch
>> register and add assertion.
>> (x86_emit_outlined_ms2sysv_save): use scratch register when
>> needed, and don't allocate stack.
>> (ix86_expand_prologue): Rearrange where SSE saves/stub call is
>> emitted, correct wrong allocation with -mcall-ms2sysv-xlogues.
>> (ix86_emit_outlined_ms2sysv_restore): Fix non-immediate offsets.
>>
>> gcc/testsuite:
>> gcc.target/i386/pr82002-2a.c: Change from xfail to fail.
>> gcc.target/i386/pr82002-2b.c: Likewise.
>>
>> Signed-off-by: Daniel Santos 
>> ---
>>  gcc/config/i386/i386.c | 76 
>> --
>>  gcc/testsuite/gcc.target/i386/pr82002-2a.c |  2 -
>>  gcc/testsuite/gcc.target/i386/pr82002-2b.c |  2 -
>>  3 files changed, 62 insertions(+), 18 deletions(-)
>>
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index 83a07afb3e1..abd8e937e0d 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -11520,7 +11520,8 @@ choose_basereg (HOST_WIDE_INT cfa_offset, rtx 
>> &base_reg,
>> The valid base registers are taken from CFUN->MACHINE->FS.  */
>>
>>  static rtx
>> -choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align)
>> +choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align,
>> +int scratch_regno = -1)
>>  {
>>rtx base_reg = NULL;
>>HOST_WIDE_INT base_offset = 0;
>> @@ -11534,6 +11535,28 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned 
>> int *align)
>>  choose_basereg (cfa_offset, base_reg, base_offset, 0, align);
>>
>>gcc_assert (base_reg != NULL);
>> +
>> +  if (TARGET_64BIT)
>> +{
>> +  rtx base_offset_rtx = GEN_INT (base_offset);
>> +
>> +  if (scratch_regno >= 0)
>> +   {
>> + if (!x86_64_immediate_operand (base_offset_rtx, DImode))
>> +   {
>> + rtx tmp;
>> + rtx scratch_reg = gen_rtx_REG (DImode, scratch_regno);
>> +
>> + emit_insn (gen_rtx_SET (scratch_reg, base_offset_rtx));
>> + tmp = gen_rtx_PLUS (DImode, scratch_reg, base_reg);
>> + emit_insn (gen_rtx_SET (scratch_reg, tmp));
>> + return scratch_reg;
>> +   }
>> +   }
>> +  else
>> +   gcc_assert (x86_64_immediate_operand (base_offset_rtx, DImode));
>> +}
>> +
>>return plus_constant (Pmode, base_reg, base_offset);
>>  }
> This function doesn't need to return a register, it can return plus
> RTX. I'd suggest the following implementation:
>
> --cut here--
> Index: i386.c
> ===
> --- i386.c  (revision 254243)
> +++ i386.c  (working copy)
> @@ -11520,7 +11520,8 @@
> The valid base registers are taken from CFUN->MACHINE->FS.  */
>
>  static rtx
> -choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align)
> +choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align,
> +unsigned int scratch_regno = INVALID_REGNUM)
>  {
>rtx base_reg = NULL;
>HOST_WIDE_INT base_offset = 0;
> @@ -11534,6 +11535,19 @@
>  choose_basereg (cfa_offset, base_reg, base_offset, 0, align);
>
>gcc_assert (base_reg != NULL);
>

Re: [PATCH 2/2] [i386] PR82002 Part 2: Correct non-immediate offset/invalid INSN

2017-11-03 Thread Daniel Santos
On 11/03/2017 02:09 AM, Uros Bizjak wrote:
> On Thu, Nov 2, 2017 at 11:43 PM, Daniel Santos  
> wrote:
>
>>>>int_registers_saved = (frame.nregs == 0);
>>>>sse_registers_saved = (frame.nsseregs == 0);
>>>> +  save_stub_call_needed = (m->call_ms2sysv);
>>>> +  gcc_assert (!(!sse_registers_saved && save_stub_call_needed));
>>> Oooh, double negation :(
>> I'm just saying that we shouldn't be saving SSE registers inline and via
>> the stub.  If I followed the naming convention of e.g.,
>> "see_registers_saved" then my variable would end up being called
>> "save_stub_called" which would be incorrect and misleading, similar to
>> how "see_registers_saved" is misleading when there are in fact no SSE
>> register that need to be saved.  Maybe I should rename
>> (int|sse)_registers_saved to (int|sse)_register_saves_needed with
>> inverted logic instead.
> But, we can just say
>
> gcc_assert (sse_registers_saved || !save_stub_call_needed);
>
> No?
>
> Uros.
>

Oh yes, I see.  Because "sse_registers_saved" really means that we've
either already saved them or don't have to, and not literally that they
have been saved.  I ranted about it's name but didn't think it all the
way through. :)

How does this patch look?  (Also, I've updated comments for
choose_baseaddr.)  Currently re-running tests.

Thanks,
Daniel
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2967876..fb81d4dba84 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -11515,12 +11515,15 @@ choose_basereg (HOST_WIDE_INT cfa_offset, rtx &base_reg,
an alignment value (in bits) that is preferred or zero and will
recieve the alignment of the base register that was selected,
irrespective of rather or not CFA_OFFSET is a multiple of that
-   alignment value.
+   alignment value.  If it is possible for the base register offset to be
+   non-immediate then SCRATCH_REGNO should specify a scratch register to
+   use.
 
The valid base registers are taken from CFUN->MACHINE->FS.  */
 
 static rtx
-choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align)
+choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align,
+		 unsigned int scratch_regno = INVALID_REGNUM)
 {
   rtx base_reg = NULL;
   HOST_WIDE_INT base_offset = 0;
@@ -11534,6 +11537,19 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align)
 choose_basereg (cfa_offset, base_reg, base_offset, 0, align);
 
   gcc_assert (base_reg != NULL);
+
+  rtx base_offset_rtx = GEN_INT (base_offset);
+
+  if (!x86_64_immediate_operand (base_offset_rtx, Pmode))
+{
+  gcc_assert (scratch_regno != INVALID_REGNUM);
+
+  rtx scratch_reg = gen_rtx_REG (Pmode, scratch_regno);
+  emit_move_insn (scratch_reg, base_offset_rtx);
+
+  return gen_rtx_PLUS (Pmode, base_reg, scratch_reg);
+}
+
   return plus_constant (Pmode, base_reg, base_offset);
 }
 
@@ -12793,23 +12809,19 @@ ix86_emit_outlined_ms2sysv_save (const struct ix86_frame &frame)
   rtx sym, addr;
   rtx rax = gen_rtx_REG (word_mode, AX_REG);
   const struct xlogue_layout &xlogue = xlogue_layout::get_instance ();
-  HOST_WIDE_INT allocate = frame.stack_pointer_offset - m->fs.sp_offset;
 
   /* AL should only be live with sysv_abi.  */
   gcc_assert (!ix86_eax_live_at_start_p ());
+  gcc_assert (m->fs.sp_offset >= frame.sse_reg_save_offset);
 
   /* Setup RAX as the stub's base pointer.  We use stack_realign_offset rather
  we've actually realigned the stack or not.  */
   align = GET_MODE_ALIGNMENT (V4SFmode);
   addr = choose_baseaddr (frame.stack_realign_offset
-			  + xlogue.get_stub_ptr_offset (), &align);
+			  + xlogue.get_stub_ptr_offset (), &align, AX_REG);
   gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode));
-  emit_insn (gen_rtx_SET (rax, addr));
 
-  /* Allocate stack if not already done.  */
-  if (allocate > 0)
-  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
-GEN_INT (-allocate), -1, false);
+  emit_insn (gen_rtx_SET (rax, addr));
 
   /* Get the stub symbol.  */
   sym = xlogue.get_stub_rtx (frame_pointer_needed ? XLOGUE_STUB_SAVE_HFP
@@ -12841,6 +12853,7 @@ ix86_expand_prologue (void)
   HOST_WIDE_INT allocate;
   bool int_registers_saved;
   bool sse_registers_saved;
+  bool save_stub_call_needed;
   rtx static_chain = NULL_RTX;
 
   if (ix86_function_naked (current_function_decl))
@@ -13016,6 +13029,8 @@ ix86_expand_prologue (void)
 
   int_registers_saved = (frame.nregs == 0);
   sse_registers_saved = (frame.nsseregs == 0);
+  save_stub_call_needed = (m->call_ms2sysv);
+  gcc_assert (sse_registers_saved || !save_stub_call_needed);
 
   if (frame_pointer_needed && !m->fs.fp_valid)
 {
@@ -13110,10 +13125,26 @@ ix86_expand_prolog

Re: [PATCH 2/2] [i386] PR82002 Part 2: Correct non-immediate offset/invalid INSN

2017-11-03 Thread Daniel Santos
On 11/03/2017 04:22 PM, Daniel Santos wrote:
> ...
> How does this patch look?  (Also, I've updated comments for
> choose_baseaddr.)  Currently re-running tests.
>
> Thanks,
> Daniel
>
> @@ -13110,10 +13125,26 @@ ix86_expand_prologue (void)
>target.  */
>if (TARGET_SEH)
>   m->fs.sp_valid = false;
> -}
>  
> -  if (m->call_ms2sysv)
> -ix86_emit_outlined_ms2sysv_save (frame);
> +  /* If SP offset is non-immediate after allocation of the stack frame,
> +  then emit SSE saves or stub call prior to allocating the rest of the
> +  stack frame.  This is less efficient for the out-of-line stub because
> +  we can't combine allocations across the call barrier, but it's better
> +  than using a scratch register.  */
> +  else if (!x86_64_immediate_operand (GEN_INT 
> (frame.stack_pointer_offset - m->fs.sp_realigned_offset), Pmode))

Oops, and also after fixing this formatting...

Daniel


PING: [PATCH v2 0/2] [testsuite, libgcc] PR80759 Fix FAILs on Solaris and Darwin

2017-07-17 Thread Daniel Santos

https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00025.html

Uros,
Can you review changes for i386 please?

Mike or Iain,
Can one of you review changes for Darwin please?  I'm not familiar with 
the platform, although Rainer tested on Darwin for me.


Ian,
Can you review changes to libgcc please?

Thank you all!
Daniel


On 07/02/2017 12:11 AM, Daniel Santos wrote:
This patchset addresses a number of testsuite issues for 
gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp, mostly occurring on Solaris 
and Darwin.  Additionally, it solves a bug in libgcc that caused link 
failures on Darwin when building with -mcall-ms2sysv-xlogues.  The 
issues are detailed in the notes for each patch.


I would particularly appreciate any feedback for Darwin as I am 
unfamiliar with the platform and Rainer and I have fashioned some of 
these changes by looking at other Darwin code in gcc.


 .../gcc.target/x86_64/abi/ms-sysv/do-test.S  | 200 
---

 .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.c  |  83 +++-
 .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp| 153 +-
 libgcc/config.host   |   6 +-
 libgcc/config/i386/i386-asm.h|  89 +
 libgcc/config/i386/resms64.S |   2 +-
 libgcc/config/i386/resms64f.S|   2 +-
 libgcc/config/i386/resms64fx.S   |   2 +-
 libgcc/config/i386/resms64x.S|   2 +-
 libgcc/config/i386/savms64.S |   2 +-
 libgcc/config/i386/savms64f.S|   2 +-
 11 files changed, 274 insertions(+), 269 deletions(-)


Many thanks to Rainer for all of his help on this!

Thanks,
Daniel




Re: [PING] [PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues

2017-07-26 Thread Daniel Santos

On 07/26/2017 02:03 PM, H.J. Lu wrote:

This patch caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81563


Yes, I discovered this flaw while working on PR 80969 but I hadn't found 
an actual testcase where it caused a problem yet.  I'm about to submit 
my patchset for review, so sorry I didn't get it committed sooner.  My 
patch set further improves sp_valid_at and fp_valid_at since it's 
possible that the the last offset the frame pointer can be used to 
access is not equal to realignment offset.  I'll try to get this out 
tonight or tomorrow.


Thanks!
Daniel


Re: [PING] [PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues

2017-07-28 Thread Daniel Santos

On 07/26/2017 02:03 PM, H.J. Lu wrote:

This patch caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81563


Hello.  I've rebased my patch set and I'm now retesting.  I'm afraid 
that your changes are wrong because my my sp_valid_at and fp_valid_at 
functions are wrong -- these are supposed to be for the base offset and 
not the CFA offset, sorry about that.  This means that the check in 
choose_basereg (and thus choose_baseaddr) have been wrong as well.  I'm 
retesting now.


Re: [PING] [PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues

2017-07-31 Thread Daniel Santos

On 07/28/2017 09:41 AM, H.J. Lu wrote:

On Fri, Jul 28, 2017 at 6:57 AM, Daniel Santos  wrote:

On 07/26/2017 02:03 PM, H.J. Lu wrote:

This patch caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81563

Hello.  I've rebased my patch set and I'm now retesting.  I'm afraid that
your changes are wrong because my my sp_valid_at and fp_valid_at functions
are wrong -- these are supposed to be for the base offset and not the CFA
offset, sorry about that.  This means that the check in choose_basereg (and
thus choose_baseaddr) have been wrong as well.  I'm retesting now.

Please check your change with gcc.target/i386/pr81563.c.

Thanks.


I'm still getting used to x86 stack math and and briefly I thought that 
my understanding of the CFA was wrong and that I had messed up 
sp_valid_at and fp_valid_at, but I was mistaken, so sorry for the false 
alarm.  My rebased patches pass all tests, so it's OK.


[PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f

2017-07-31 Thread Daniel Santos
When working on the Wine64 project to use aligned SSE MOVs after SP 
realignment and adding -mcall-ms2sysv-xlogues, I overlooked the fact 
that the function body may require a stack alignment greater than 
16-bytes.  This can result in an ICE with -mabi=ms -mavx512f and some 
other cases.  This patch set reworks the strategy for calculating the 
frame layout following normal (inline) integral register saves (at 
frame.reg_save_offset) to the start of the frame for the local function 
(frame.frame_pointer_offset).


I've completed a bootstrap and full regression test with no additional 
failures, but I don't have access to a machine with avx512 extensions.  
I have manually run the tests that need it using the Intel SDE, but I 
haven't been able to validate that my 
check_effective_target_avx512f_runtime code in 
gcc/testsuite/lib/target-supports.exp is correctly enabling the tests 
for pr80969-4*.c.


As an aside note, I still have some rework of the ms-sysv.exp tests that 
I haven't yet to submitted and in which I'm adding more tests for cases 
with uncommon stacks, as in PR 81563.


Thanks,
Daniel
2017-07-23  Daniel Santos  

* config/i386/i386.h (ix86_frame::outlined_save_offset): Remove field.
(ix86_frame::stack_realign_allocate_offset): Likewise.
(ix86_frame::stack_realign_allocate): New field.
(struct machine_frame_state): Modify comments.
(machine_frame_state::sp_realigned_fp_end): New field.
(machine_function::call_ms2sysv_pad_out): Remove field.
* config/i386/i386.c (xlogue_layout::get_stack_space_used): Modify.
(ix86_compute_frame_layout): Likewise.
(sp_valid_at): Likewise.
(fp_valid_at): Likewise.
(choose_baseaddr): Modify comments.
(ix86_emit_outlined_ms2sysv_save): Modify.
(ix86_expand_prologue): Likewise.
(ix86_expand_epilogue): Modify comments.
2017-07-23  Daniel Santos  
* gcc.target/i386/pr80969-1.c: New testcase.
* gcc.target/i386/pr80969-2a.c: Likewise.
* gcc.target/i386/pr80969-2.c: Likewise.
* gcc.target/i386/pr80969-3.c: Likewise.
* gcc.target/i386/pr80969-4a.c: Likewise.
* gcc.target/i386/pr80969-4b.c: Likewise.
* gcc.target/i386/pr80969-4.c: Likewise.


[PATCH 1/6] [i386] Correct comments, add assertions to sp_valid_at and fp_valid_at

2017-07-31 Thread Daniel Santos
When we realign the stack frame (without DRAP), there may be a range of
CFA offsets that should never be touched because they are alignment
padding and any reference to them is almost certainly an error.
Previously, only the offset of where the realigned stack frame starts
was recorded and checked in sp_valid_at and fp_valid_at.

This change adds sp_realigned_fp_last to struct machine_frame_state to
record the last valid offset from which the frame pointer can be used
when the stack pointer is realigned and modifies sp_valid_at and
fp_valid_at to fail an assertion when passed an offset in the "no-man's
land" between these two values.

Comments for struct machine_frame_state incorrectly stated that a
realigned stack pointer could be used to access offsets equal to or
greater than sp_realigned_offset, but it is only valid for offsets that
are greater.  This was the (incorrect) behaviour of sp_valid_at and
fp_valid_at prior to r250587 and this change now corrects the
documentation and adds clarification of the CFA-relative calculation.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 45 ++---
 gcc/config/i386/i386.h | 18 +-
 2 files changed, 43 insertions(+), 20 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index f1486ff3750..690631dfe43 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13102,26 +13102,36 @@ choose_baseaddr_len (unsigned int regno, 
HOST_WIDE_INT offset)
   return len;
 }
 
-/* Determine if the stack pointer is valid for accessing the cfa_offset.
-   The register is saved at CFA - CFA_OFFSET.  */
+/* Determine if the stack pointer is valid for accessing the CFA_OFFSET in
+   the frame save area.  The register is saved at CFA - CFA_OFFSET.  */
 
-static inline bool
+static bool
 sp_valid_at (HOST_WIDE_INT cfa_offset)
 {
   const struct machine_frame_state &fs = cfun->machine->fs;
-  return fs.sp_valid && !(fs.sp_realigned
- && cfa_offset <= fs.sp_realigned_offset);
+  if (fs.sp_realigned && cfa_offset <= fs.sp_realigned_offset)
+{
+  /* Validate that the cfa_offset isn't in a "no-man's land".  */
+  gcc_assert (cfa_offset <= fs.sp_realigned_fp_last);
+  return false;
+}
+  return fs.sp_valid;
 }
 
-/* Determine if the frame pointer is valid for accessing the cfa_offset.
-   The register is saved at CFA - CFA_OFFSET.  */
+/* Determine if the frame pointer is valid for accessing the CFA_OFFSET in
+   the frame save area.  The register is saved at CFA - CFA_OFFSET.  */
 
 static inline bool
 fp_valid_at (HOST_WIDE_INT cfa_offset)
 {
   const struct machine_frame_state &fs = cfun->machine->fs;
-  return fs.fp_valid && !(fs.sp_valid && fs.sp_realigned
- && cfa_offset > fs.sp_realigned_offset);
+  if (fs.sp_realigned && cfa_offset > fs.sp_realigned_fp_last)
+{
+  /* Validate that the cfa_offset isn't in a "no-man's land".  */
+  gcc_assert (cfa_offset >= fs.sp_realigned_offset);
+  return false;
+}
+  return fs.fp_valid;
 }
 
 /* Choose a base register based upon alignment requested, speed and/or
@@ -14560,6 +14570,9 @@ ix86_expand_prologue (void)
   int align_bytes = crtl->stack_alignment_needed / BITS_PER_UNIT;
   gcc_assert (align_bytes > MIN_STACK_BOUNDARY / BITS_PER_UNIT);
 
+  /* Record last valid frame pointer offset.  */
+  m->fs.sp_realigned_fp_last = m->fs.sp_offset;
+
   /* The computation of the size of the re-aligned stack frame means
 that we must allocate the size of the register save area before
 performing the actual alignment.  Otherwise we cannot guarantee
@@ -14573,13 +14586,15 @@ ix86_expand_prologue (void)
   insn = emit_insn (ix86_gen_andsp (stack_pointer_rtx,
stack_pointer_rtx,
GEN_INT (-align_bytes)));
-  /* For the purposes of register save area addressing, the stack
-pointer can no longer be used to access anything in the frame
-below m->fs.sp_realigned_offset and the frame pointer cannot be
-used for anything at or above.  */
   m->fs.sp_offset = ROUND_UP (m->fs.sp_offset, align_bytes);
   m->fs.sp_realigned = true;
   m->fs.sp_realigned_offset = m->fs.sp_offset - frame.nsseregs * 16;
+  /* The stack pointer may no longer be equal to CFA - m->fs.sp_offset.
+Beyond this point, stack access should be done via choose_baseaddr or
+by using sp_valid_at and fp_valid_at to determine the correct base
+register.  Henceforth, any CFA offset should be thought of as logical
+and not physical.  */
+  gcc_assert (m->fs.sp_realigned_offset >= m->fs.sp_realigned_fp_last);
   gcc_assert (m->fs.

[PATCH 2/6] [i386] Remove ix86_frame::outlined_save_offset

2017-07-31 Thread Daniel Santos
This value was used in an earlier incarnation of the
-mcall-ms2sysv-xlogues patch set but is now set and never read.  The
value of ix86_frame::sse_reg_save_offset serves the same purpose.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 1 -
 gcc/config/i386/i386.h | 4 +---
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 690631dfe43..47c5608c3cd 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12966,7 +12966,6 @@ ix86_compute_frame_layout (void)
 
   offset += xlogue.get_stack_space_used ();
   gcc_assert (!(offset & 0xf));
-  frame->outlined_save_offset = offset;
 }
 
   /* Align and set SSE register save area.  */
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index ce5bb7f6677..1648bdf1556 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2477,8 +2477,7 @@ enum avx_u128_state
<- end of stub-saved/restored regs
  [padding1]
]
-   <- outlined_save_offset
-   <- sse_regs_save_offset
+   <- sse_reg_save_offset
[padding2]
   |<- FRAME_POINTER
[va_arg registers]  |
@@ -2504,7 +2503,6 @@ struct GTY(()) ix86_frame
   HOST_WIDE_INT reg_save_offset;
   HOST_WIDE_INT stack_realign_allocate_offset;
   HOST_WIDE_INT stack_realign_offset;
-  HOST_WIDE_INT outlined_save_offset;
   HOST_WIDE_INT sse_reg_save_offset;
 
   /* When save_regs_using_mov is set, emit prologue using
-- 
2.13.3



[PATCH 3/6] [i386] Remove machine_function::call_ms2sysv_pad_out

2017-07-31 Thread Daniel Santos
The -mcall-ms2sysv-xlogues project added the boolean fields
call_ms2sysv_pad_in and call_ms2sysv_pad_out to struct machine_function
to track rather or not an additional 8 bytes of padding was needed for
stack alignment prior to and after the stub save area.  This design was
based upon the faulty assumption the function body would not require a
stack alignment greater than 16 bytes.  This continues to work well for
managing padding prior to the stub save area, but will not work for the
outgoing alignment.

Rather than changing machine_function::call_ms2sysv_pad_out to a larger
type, this patch removes it, thus transferring responsibility for stack
alignment following the stub save area from class xlogue_layout to the
body of ix86_compute_frame_layout.  Since the 64-bit va_arg register
save area is always a multiple of 16-bytes in size (176 for System V ABI
and 96 for Microsoft ABI), the ROUND_UP calculation for the stack offset
at the start of the function body (frame.frame_pointer_offset) will
assure there is enough room for any padding needed to keep the save area
for SSE va_args 16-byte aligned, so no modification is needed for that
calculation.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 18 --
 gcc/config/i386/i386.h |  8 ++--
 2 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 47c5608c3cd..e2e9546a27c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2491,9 +2491,7 @@ public:
 unsigned last_reg = m->call_ms2sysv_extra_regs + MIN_REGS - 1;
 
 gcc_assert (m->call_ms2sysv_extra_regs <= MAX_EXTRA_REGS);
-return m_regs[last_reg].offset
-  + (m->call_ms2sysv_pad_out ? 8 : 0)
-  + STUB_INDEX_OFFSET;
+return m_regs[last_reg].offset + STUB_INDEX_OFFSET;
   }
 
   /* Returns the offset for the base pointer used by the stub.  */
@@ -12849,13 +12847,12 @@ ix86_compute_frame_layout (void)
{
  unsigned count = xlogue_layout::count_stub_managed_regs ();
  m->call_ms2sysv_extra_regs = count - xlogue_layout::MIN_REGS;
+ m->call_ms2sysv_pad_in = 0;
}
 }
 
   frame->nregs = ix86_nsaved_regs ();
   frame->nsseregs = ix86_nsaved_sseregs ();
-  m->call_ms2sysv_pad_in = 0;
-  m->call_ms2sysv_pad_out = 0;
 
   /* 64-bit MS ABI seem to require stack alignment to be always 16,
  except for function prologues, leaf functions and when the defult
@@ -12957,15 +12954,7 @@ ix86_compute_frame_layout (void)
   gcc_assert (!frame->nsseregs);
 
   m->call_ms2sysv_pad_in = !!(offset & UNITS_PER_WORD);
-
-  /* Select an appropriate layout for incoming stack offset.  */
-  const struct xlogue_layout &xlogue = xlogue_layout::get_instance ();
-
-  if ((offset + xlogue.get_stack_space_used ()) & UNITS_PER_WORD)
-   m->call_ms2sysv_pad_out = 1;
-
-  offset += xlogue.get_stack_space_used ();
-  gcc_assert (!(offset & 0xf));
+  offset += xlogue_layout::get_instance ().get_stack_space_used ();
 }
 
   /* Align and set SSE register save area.  */
@@ -12993,6 +12982,7 @@ ix86_compute_frame_layout (void)
 
   /* Align start of frame for local function.  */
   if (stack_realign_fp
+  || m->call_ms2sysv
   || offset != frame->sse_reg_save_offset
   || size != 0
   || !crtl->is_leaf
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 1648bdf1556..b08e45f68d4 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2646,17 +2646,13 @@ struct GTY(()) machine_function {
   BOOL_BITFIELD arg_reg_available : 1;
 
   /* If true, we're out-of-lining reg save/restore for regs clobbered
- by ms_abi functions calling a sysv function.  */
+ by 64-bit ms_abi functions calling a sysv_abi function.  */
   BOOL_BITFIELD call_ms2sysv : 1;
 
   /* If true, the incoming 16-byte aligned stack has an offset (of 8) and
- needs padding.  */
+ needs padding prior to out-of-line stub save/restore area.  */
   BOOL_BITFIELD call_ms2sysv_pad_in : 1;
 
-  /* If true, the size of the stub save area plus inline int reg saves will
- result in an 8 byte offset, so needs padding.  */
-  BOOL_BITFIELD call_ms2sysv_pad_out : 1;
-
   /* This is the number of extra registers saved by stub (valid range is
  0-6). Each additional register is only saved/restored by the stubs
  if all successive ones are. (Will always be zero when using a hard
-- 
2.13.3



[PATCH 4/6] [i386] Modify ix86_compute_frame_layout

2017-07-31 Thread Daniel Santos
These changes affect how the stack frame is calculated from the region
starting at frame.reg_save_offset until frame.frame_pointer_offset,
which includes either the stub save area or the (inline) SSE register
save area and the va_args register save area.

The calculation used when not realigning the stack pointer is the same,
but when when realigning we calculate the 16-byte aligned space needed
in reverse so that the stack realignment boundary at
frame.stack_realign_offset may not necessarily be a multiple of
stack_alignment_needed, but the value of frame.frame_pointer_offset
will. This results in a properly aligned stack for the function body and
avoids wasting stack space.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 116 +
 gcc/config/i386/i386.h |   2 +-
 2 files changed, 80 insertions(+), 38 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index e2e9546a27c..e92f322de0c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12874,6 +12874,14 @@ ix86_compute_frame_layout (void)
   gcc_assert (preferred_alignment >= STACK_BOUNDARY / BITS_PER_UNIT);
   gcc_assert (preferred_alignment <= stack_alignment_needed);
 
+  /* The only ABI saving SSE regs should be 64-bit ms_abi.  */
+  gcc_assert (TARGET_64BIT || !frame->nsseregs);
+  if (TARGET_64BIT && m->call_ms2sysv)
+{
+  gcc_assert (stack_alignment_needed >= 16);
+  gcc_assert (!frame->nsseregs);
+}
+
   /* For SEH we have to limit the amount of code movement into the prologue.
  At present we do this via a BLOCKAGE, at which point there's very little
  scheduling that can be done, which means that there's very little point
@@ -12936,54 +12944,88 @@ ix86_compute_frame_layout (void)
   if (TARGET_SEH)
 frame->hard_frame_pointer_offset = offset;
 
-  /* When re-aligning the stack frame, but not saving SSE registers, this
- is the offset we want adjust the stack pointer to.  */
-  frame->stack_realign_allocate_offset = offset;
+  /* Calculate the size of the va-arg area (not including padding, if any).  */
+  frame->va_arg_size = ix86_varargs_gpr_size + ix86_varargs_fpr_size;
 
-  /* The re-aligned stack starts here.  Values before this point are not
- directly comparable with values below this point.  Use sp_valid_at
- to determine if the stack pointer is valid for a given offset and
- fp_valid_at for the frame pointer.  */
   if (stack_realign_fp)
-offset = ROUND_UP (offset, stack_alignment_needed);
-  frame->stack_realign_offset = offset;
-
-  if (TARGET_64BIT && m->call_ms2sysv)
 {
-  gcc_assert (stack_alignment_needed >= 16);
-  gcc_assert (!frame->nsseregs);
+  /* We may need a 16-byte aligned stack for the remainder of the
+register save area, but the stack frame for the local function
+may require a greater alignment if using AVX/2/512.  In order
+to avoid wasting space, we first calculate the space needed for
+the rest of the register saves, add that to the stack pointer,
+and then realign the stack to the boundary of the start of the
+frame for the local function.  */
+  HOST_WIDE_INT space_needed = 0;
+  HOST_WIDE_INT sse_reg_space_needed = 0;
 
-  m->call_ms2sysv_pad_in = !!(offset & UNITS_PER_WORD);
-  offset += xlogue_layout::get_instance ().get_stack_space_used ();
-}
+  if (TARGET_64BIT)
+   {
+ if (m->call_ms2sysv)
+   {
+ m->call_ms2sysv_pad_in = 0;
+ space_needed = xlogue_layout::get_instance 
().get_stack_space_used ();
+   }
 
-  /* Align and set SSE register save area.  */
-  else if (frame->nsseregs)
-{
-  /* The only ABI that has saved SSE registers (Win64) also has a
-16-byte aligned default stack.  However, many programs violate
-the ABI, and Wine64 forces stack realignment to compensate.
+ else if (frame->nsseregs)
+   /* The only ABI that has saved SSE registers (Win64) also has a
+  16-byte aligned default stack.  However, many programs violate
+  the ABI, and Wine64 forces stack realignment to compensate.  */
+   space_needed = frame->nsseregs * 16;
+
+ sse_reg_space_needed = space_needed = ROUND_UP (space_needed, 16);
+
+ /* 64-bit frame->va_arg_size should always be a multiple of 16, but
+rounding to be pedantic.  */
+ space_needed = ROUND_UP (space_needed + frame->va_arg_size, 16);
+   }
+  else
+   space_needed = frame->va_arg_size;
+
+  /* Record the allocation size required prior to the realignment AND.  */
+  frame->stack_realign_allocate = space_needed;
+
+  /* The re-aligned stack starts at frame->stack_realign_offset.  Values
+before this point are not directly comparable with values below

[PATCH 5/6] [i386] Modify SP realignment in ix86_expand_prologue, et. al.

2017-07-31 Thread Daniel Santos
The SP allocation calculation is now done in ix86_compute_frame_layout
and the result stored in ix86_frame::stack_realign_allocate.  This
change also updates comments for choose_baseaddr to clarify that the
alignment returned doesn't necessarily reflect the alignment of the
cfa_offset passed (e.g., you can pass cfa_offset 48 and it can return an
alignment of 64 bytes).

Since the alignment required may be more than 16-bytes, we cannot defer
SP allocation to ix86_emit_outlined_ms2sysv_save (when it's enabled), so
that function needs to be updated as well.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 54 +++---
 1 file changed, 29 insertions(+), 25 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index e92f322de0c..7e1fc4dfbf5 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13273,10 +13273,13 @@ choose_basereg (HOST_WIDE_INT cfa_offset, rtx 
&base_reg,
 }
 
 /* Return an RTX that points to CFA_OFFSET within the stack frame and
-   the alignment of address.  If align is non-null, it should point to
+   the alignment of address.  If ALIGN is non-null, it should point to
an alignment value (in bits) that is preferred or zero and will
-   recieve the alignment of the base register that was selected.  The
-   valid base registers are taken from CFUN->MACHINE->FS.  */
+   recieve the alignment of the base register that was selected,
+   irrespective of rather or not CFA_OFFSET is a multiple of that
+   alignment value.
+
+   The valid base registers are taken from CFUN->MACHINE->FS.  */
 
 static rtx
 choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align)
@@ -14322,35 +14325,35 @@ ix86_emit_outlined_ms2sysv_save (const struct 
ix86_frame &frame)
   rtx sym, addr;
   rtx rax = gen_rtx_REG (word_mode, AX_REG);
   const struct xlogue_layout &xlogue = xlogue_layout::get_instance ();
-  HOST_WIDE_INT rax_offset = xlogue.get_stub_ptr_offset () + m->fs.sp_offset;
-  HOST_WIDE_INT stack_alloc_size = frame.stack_pointer_offset - 
m->fs.sp_offset;
-  HOST_WIDE_INT stack_align_off_in = xlogue.get_stack_align_off_in ();
+  HOST_WIDE_INT allocate = frame.stack_pointer_offset - m->fs.sp_offset;
+
+  /* AL should only be live with sysv_abi.  */
+  gcc_assert (!ix86_eax_live_at_start_p ());
+
+  /* Setup RAX as the stub's base pointer.  We use stack_realign_offset rather
+ we've actually realigned the stack or not.  */
+  align = GET_MODE_ALIGNMENT (V4SFmode);
+  addr = choose_baseaddr (frame.stack_realign_offset
+ + xlogue.get_stub_ptr_offset (), &align);
+  gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode));
+  emit_insn (gen_rtx_SET (rax, addr));
 
-  /* Verify that the incoming stack 16-byte alignment offset matches the
- layout we're using.  */
-  gcc_assert (stack_align_off_in == (m->fs.sp_offset & UNITS_PER_WORD));
+  /* Allocate stack if not already done.  */
+  if (allocate > 0)
+  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
+   GEN_INT (-allocate), -1, false);
 
   /* Get the stub symbol.  */
   sym = xlogue.get_stub_rtx (frame_pointer_needed ? XLOGUE_STUB_SAVE_HFP
  : XLOGUE_STUB_SAVE);
   RTVEC_ELT (v, vi++) = gen_rtx_USE (VOIDmode, sym);
 
-  /* Setup RAX as the stub's base pointer.  */
-  align = GET_MODE_ALIGNMENT (V4SFmode);
-  addr = choose_baseaddr (rax_offset, &align);
-  gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode));
-  insn = emit_insn (gen_rtx_SET (rax, addr));
-
-  gcc_assert (stack_alloc_size >= xlogue.get_stack_space_used ());
-  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
-GEN_INT (-stack_alloc_size), -1,
-m->fs.cfa_reg == stack_pointer_rtx);
   for (i = 0; i < ncregs; ++i)
 {
   const xlogue_layout::reginfo &r = xlogue.get_reginfo (i);
   rtx reg = gen_rtx_REG ((SSE_REGNO_P (r.regno) ? V4SFmode : word_mode),
 r.regno);
-  RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);;
+  RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);
 }
 
   gcc_assert (vi == (unsigned)GET_NUM_ELEM (v));
@@ -14608,8 +14611,8 @@ ix86_expand_prologue (void)
 that we must allocate the size of the register save area before
 performing the actual alignment.  Otherwise we cannot guarantee
 that there's enough storage above the realignment point.  */
-  allocate = frame.stack_realign_allocate_offset - m->fs.sp_offset;
-  if (allocate && !m->call_ms2sysv)
+  allocate = frame.stack_realign_allocate;
+  if (allocate)
 pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
   GEN_INT (-allocate), -1, 

[PATCH 6/6] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available

2017-07-31 Thread Daniel Santos
The testcase in the PR is used as a base and relevant variants are added
to test other factors affected by the patch set.

pr80969-1.c   Base test case.
pr80969-2.c   With ms to sysv call.
pr80969-2a.c  With ms to sysv call using stubs.
pr80969-3.c   With alloca (for DRAP test).
pr80969-4.c   With va_args passed via va_list
pr80969-4a.c  With va_args passed via va_list and ms to sysv call.
pr80969-4b.c  With va_args passed via va_list and ms to sysv call using
  stubs.

Signed-off-by: Daniel Santos 
---
 gcc/testsuite/gcc.target/i386/pr80969-1.c  |  16 
 gcc/testsuite/gcc.target/i386/pr80969-2.c  |  26 ++
 gcc/testsuite/gcc.target/i386/pr80969-2a.c |  26 ++
 gcc/testsuite/gcc.target/i386/pr80969-3.c  |  31 
 gcc/testsuite/gcc.target/i386/pr80969-4.c  | 123 
 gcc/testsuite/gcc.target/i386/pr80969-4a.c | 124 +
 gcc/testsuite/gcc.target/i386/pr80969-4b.c | 124 +
 gcc/testsuite/lib/target-supports.exp  |  66 +++
 8 files changed, 536 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4b.c

diff --git a/gcc/testsuite/gcc.target/i386/pr80969-1.c 
b/gcc/testsuite/gcc.target/i386/pr80969-1.c
new file mode 100644
index 000..eb8d767a778
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-1.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+int a[56];
+int b;
+int main (int argc, char *argv[]) {
+  int c;
+  for (; b; b++) {
+c = b;
+if (b & 1)
+  c = 2;
+a[b] = c;
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2.c 
b/gcc/testsuite/gcc.target/i386/pr80969-2.c
new file mode 100644
index 000..e868d6c7e5c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-2.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test when calling a sysv func.  */
+
+int a[56];
+int b;
+
+static void __attribute__((sysv_abi)) sysv ()
+{
+}
+
+void __attribute__((sysv_abi)) (*volatile const sysv_noinfo)() = sysv;
+
+int main (int argc, char *argv[]) {
+  int c;
+  sysv_noinfo ();
+  for (; b; b++) {
+c = b;
+if (b & 1)
+  c = 2;
+a[b] = c;
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2a.c 
b/gcc/testsuite/gcc.target/i386/pr80969-2a.c
new file mode 100644
index 000..071a90534a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-2a.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f -mcall-ms2sysv-xlogues" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test when calling a sysv func using save/restore stubs.  */
+
+int a[56];
+int b;
+
+static void __attribute__((sysv_abi)) sysv ()
+{
+}
+
+void __attribute__((sysv_abi)) (*volatile const sysv_noinfo)() = sysv;
+
+int main (int argc, char *argv[]) {
+  int c;
+  sysv_noinfo ();
+  for (; b; b++) {
+c = b;
+if (b & 1)
+  c = 2;
+a[b] = c;
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-3.c 
b/gcc/testsuite/gcc.target/i386/pr80969-3.c
new file mode 100644
index 000..5982981b55c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-3.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test with alloca (and DRAP).  */
+
+#include 
+
+int a[56];
+volatile int b = -12345;
+volatile const int d = 42;
+
+void foo (int *x, int y, int z)
+{
+}
+
+void (*volatile const foo_noinfo)(int *, int, int) = foo;
+
+int main (int argc, char *argv[]) {
+  int c;
+  int *e = alloca (d);
+  foo_noinfo (e, d, 0);
+  for (; b; b++) {
+c = b;
+if (b & 1)
+  c = 2;
+foo_noinfo (e, d, c);
+a[-(b % 56)] = c;
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-4.c 
b/gcc/testsuite/gcc.target/i386/pr80969-4.c
new file mode 100644
index 000..1ec54d081cd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-4.c
@@ -0,0 +1,123 @@
+/* { dg-do run { target avx512f_runtime } } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test with avx512 and va_args.  */
+
+#include 
+#include 
+
+#include "avx-check.h"
+
+int a[56];
+int b;
+
+__m128 n1 = { -283.3, -23.3, 213.4, 1119.03 };
+__m512d n2 = { -93.83, 893.318, 3994.3, -39484.0, 830.32, -328.32, 3

Re: [PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f

2017-07-31 Thread Daniel Santos
Well I just learned how to test 32-bit earlier and I've uncovered a 
problem when running 32-bit tests.  Do you want me to commit the the two 
patches (squashed together) in the mean time?


Thanks,
Daniel




[PATCH 5/6 v2] [i386] Modify SP realignment in ix86_expand_prologue, et. al.

2017-08-02 Thread Daniel Santos
My first version of this patch inited m->fs.sp_realigned_fp_last with
the value of m->fs.sp_offset prior to performing the stack realignment.
I had forgotten, however, that when we're saving GP regs using MOV that
we delay SP modification as long as possible so that the value of
m->fs.sp_offset at this point is correct when we've used push, but
incorrect when we've used mov.

This time I've bootstraped with --enable-checking=yes,rtl
--enable-languages=all and reg tested using the below command to test both 64-
and 32-bit code.

  make -kj8 RUNTESTFLAGS="--target_board=unix/\{,-m32\}" check

Original patch description:

The SP allocation calculation is now done in ix86_compute_frame_layout
and the result stored in ix86_frame::stack_realign_allocate.  This
change also updates comments for choose_baseaddr to clarify that the
alignment returned doesn't necessarily reflect the alignment of the
cfa_offset passed (e.g., you can pass cfa_offset 48 and it can return an
alignment of 64 bytes).

Since the alignment required may be more than 16-bytes, we cannot defer
SP allocation to ix86_emit_outlined_ms2sysv_save (when it's enabled), so
that function needs to be updated as well.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 58 --
 1 file changed, 32 insertions(+), 26 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 0dc366cf16e..a1f39cd714c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13289,10 +13289,13 @@ choose_basereg (HOST_WIDE_INT cfa_offset, rtx 
&base_reg,
 }
 
 /* Return an RTX that points to CFA_OFFSET within the stack frame and
-   the alignment of address.  If align is non-null, it should point to
+   the alignment of address.  If ALIGN is non-null, it should point to
an alignment value (in bits) that is preferred or zero and will
-   recieve the alignment of the base register that was selected.  The
-   valid base registers are taken from CFUN->MACHINE->FS.  */
+   recieve the alignment of the base register that was selected,
+   irrespective of rather or not CFA_OFFSET is a multiple of that
+   alignment value.
+
+   The valid base registers are taken from CFUN->MACHINE->FS.  */
 
 static rtx
 choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align)
@@ -14338,35 +14341,35 @@ ix86_emit_outlined_ms2sysv_save (const struct 
ix86_frame &frame)
   rtx sym, addr;
   rtx rax = gen_rtx_REG (word_mode, AX_REG);
   const struct xlogue_layout &xlogue = xlogue_layout::get_instance ();
-  HOST_WIDE_INT rax_offset = xlogue.get_stub_ptr_offset () + m->fs.sp_offset;
-  HOST_WIDE_INT stack_alloc_size = frame.stack_pointer_offset - 
m->fs.sp_offset;
-  HOST_WIDE_INT stack_align_off_in = xlogue.get_stack_align_off_in ();
+  HOST_WIDE_INT allocate = frame.stack_pointer_offset - m->fs.sp_offset;
+
+  /* AL should only be live with sysv_abi.  */
+  gcc_assert (!ix86_eax_live_at_start_p ());
+
+  /* Setup RAX as the stub's base pointer.  We use stack_realign_offset rather
+ we've actually realigned the stack or not.  */
+  align = GET_MODE_ALIGNMENT (V4SFmode);
+  addr = choose_baseaddr (frame.stack_realign_offset
+ + xlogue.get_stub_ptr_offset (), &align);
+  gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode));
+  emit_insn (gen_rtx_SET (rax, addr));
 
-  /* Verify that the incoming stack 16-byte alignment offset matches the
- layout we're using.  */
-  gcc_assert (stack_align_off_in == (m->fs.sp_offset & UNITS_PER_WORD));
+  /* Allocate stack if not already done.  */
+  if (allocate > 0)
+  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
+   GEN_INT (-allocate), -1, false);
 
   /* Get the stub symbol.  */
   sym = xlogue.get_stub_rtx (frame_pointer_needed ? XLOGUE_STUB_SAVE_HFP
  : XLOGUE_STUB_SAVE);
   RTVEC_ELT (v, vi++) = gen_rtx_USE (VOIDmode, sym);
 
-  /* Setup RAX as the stub's base pointer.  */
-  align = GET_MODE_ALIGNMENT (V4SFmode);
-  addr = choose_baseaddr (rax_offset, &align);
-  gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode));
-  insn = emit_insn (gen_rtx_SET (rax, addr));
-
-  gcc_assert (stack_alloc_size >= xlogue.get_stack_space_used ());
-  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
-GEN_INT (-stack_alloc_size), -1,
-m->fs.cfa_reg == stack_pointer_rtx);
   for (i = 0; i < ncregs; ++i)
 {
   const xlogue_layout::reginfo &r = xlogue.get_reginfo (i);
   rtx reg = gen_rtx_REG ((SSE_REGNO_P (r.regno) ? V4SFmode : word_mode),
 r.regno);
-  RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);;
+  RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);
 }
 
   gcc_assert (v

[PATCH 6/6 v2] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available

2017-08-08 Thread Daniel Santos
This update adds documentation for the new effective taregts in addition to a
few existing effective targets that were undocumented.

Changes to lib/target-supports.exp and documentation:
* Add effective-targets avx512f and avx512f_runtime (needed for new
  tests).
* Corrects bug in check_avx2_hw_available.
* Adds documentation for effective-targets avx2, avx2_runtime (both
  missing), avx512f and avx512f_runtime.

The following tests are added.  The testcase in the PR is used as a base
and relevant variants are added to test other factors affected by the
patch set.

pr80969-1.c   Base test case.
pr80969-2.c   With ms to sysv call.
pr80969-2a.c  With ms to sysv call using stubs.
pr80969-3.c   With alloca (for DRAP test).
pr80969-4.c   With va_args passed via va_list
pr80969-4a.c  With va_args passed via va_list and ms to sysv call.
pr80969-4b.c  With va_args passed via va_list and ms to sysv call using
  stubs.

Signed-off-by: Daniel Santos 
---
 gcc/doc/sourcebuild.texi   |  12 +++
 gcc/testsuite/gcc.target/i386/pr80969-1.c  |  16 
 gcc/testsuite/gcc.target/i386/pr80969-2.c  |  26 ++
 gcc/testsuite/gcc.target/i386/pr80969-2a.c |  26 ++
 gcc/testsuite/gcc.target/i386/pr80969-3.c  |  31 
 gcc/testsuite/gcc.target/i386/pr80969-4.c  | 123 
 gcc/testsuite/gcc.target/i386/pr80969-4a.c | 124 +
 gcc/testsuite/gcc.target/i386/pr80969-4b.c | 124 +
 gcc/testsuite/lib/target-supports.exp  |  66 +++
 9 files changed, 548 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4b.c

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 85af8778167..66f040f212d 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1852,6 +1852,18 @@ Target supports compiling @code{avx} instructions.
 @item avx_runtime
 Target supports the execution of @code{avx} instructions.
 
+@item avx2
+Target supports compiling @code{avx2} instructions.
+
+@item avx2_runtime
+Target supports the execution of @code{avx2} instructions.
+
+@item avx512f
+Target supports compiling @code{avx512f} instructions.
+
+@item avx512f_runtime
+Target supports the execution of @code{avx512f} instructions.
+
 @item cell_hw
 Test system can execute AltiVec and Cell PPU instructions.
 
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-1.c 
b/gcc/testsuite/gcc.target/i386/pr80969-1.c
new file mode 100644
index 000..eb8d767a778
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-1.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+int a[56];
+int b;
+int main (int argc, char *argv[]) {
+  int c;
+  for (; b; b++) {
+c = b;
+if (b & 1)
+  c = 2;
+a[b] = c;
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2.c 
b/gcc/testsuite/gcc.target/i386/pr80969-2.c
new file mode 100644
index 000..e868d6c7e5c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-2.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test when calling a sysv func.  */
+
+int a[56];
+int b;
+
+static void __attribute__((sysv_abi)) sysv ()
+{
+}
+
+void __attribute__((sysv_abi)) (*volatile const sysv_noinfo)() = sysv;
+
+int main (int argc, char *argv[]) {
+  int c;
+  sysv_noinfo ();
+  for (; b; b++) {
+c = b;
+if (b & 1)
+  c = 2;
+a[b] = c;
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2a.c 
b/gcc/testsuite/gcc.target/i386/pr80969-2a.c
new file mode 100644
index 000..071a90534a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-2a.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f -mcall-ms2sysv-xlogues" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test when calling a sysv func using save/restore stubs.  */
+
+int a[56];
+int b;
+
+static void __attribute__((sysv_abi)) sysv ()
+{
+}
+
+void __attribute__((sysv_abi)) (*volatile const sysv_noinfo)() = sysv;
+
+int main (int argc, char *argv[]) {
+  int c;
+  sysv_noinfo ();
+  for (; b; b++) {
+c = b;
+if (b & 1)
+  c = 2;
+a[b] = c;
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-3.c 
b/gcc/testsuite/gcc.target/i386/pr80969-3.c
new file mode 100644
index 000..5982981b55c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-3.c
@@ -0,0 +1,31 @@
+/*

PING Re: [PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f

2017-08-08 Thread Daniel Santos
Original message: https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02005.html

Patches 2 and 3 have been committed and I have corrected the error in
patch 5.  I configuring with --enable-checking=yes,rtl
--enable-languages=all and retested with
RUNTESTFLAGS="--target_board=unix/\{,-m32\}"  The updated patches fix an
error when using mov instead of push and add documentation for changes
to target-supports.exp.  I have included modified ChangeLogs.

In addition to to fixing the ICE, this patch set makes more efficient
use of stack space in some cases the outgoing stack boundary is > 16
bytes and realignment is necessary.  This adds new tests, some of which
require avx512f (gcc/testsuite/gcc.target/i386/pr80969-4*.c) -- these I
have only tested these using Intel SDE.

Below is an updated list of the patches.

1. https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02006.html
2. Committed.
3. Committed.
4. https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02009.html
5. v2 -- https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00249.html
6. v2 -- https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00618.html

Thanks,
Daniel
2017-08-08  Daniel Santos  

* config/i386/i386.h (ix86_frame::stack_realign_allocate_offset):
Remove
(ix86_frame::stack_realign_allocate): New field.
(struct machine_frame_state): Modify comments.
(machine_frame_state::sp_realigned_fp_end): New field.
* config/i386/i386.c (ix86_compute_frame_layout): Modify.
(sp_valid_at): Likewise.
(fp_valid_at): Likewise.
(choose_baseaddr): Modify comments.
(ix86_emit_outlined_ms2sysv_save): Modify.
(ix86_expand_prologue): Likewise.
* doc/sourcebuild.texi (avx2, avx2_runtime): Add missing items to
effective-targets.
(avx512f, avx512f_runtime): Add new items to effective-tarets.
2017-08-08  Daniel Santos  

* lib/target-supports.exp (check_avx512_os_support_available): New
Procedure.
(check_avx2_hw_available): Modify.
(check_avx512f_hw_available): New Procedure.
(check_effective_target_avx512f_runtime): Likewise.
* gcc.target/i386/pr80969-1.c: New testcase.
* gcc.target/i386/pr80969-2a.c: Likewise.
* gcc.target/i386/pr80969-2.c: Likewise.
* gcc.target/i386/pr80969-3.c: Likewise.
* gcc.target/i386/pr80969-4a.c: Likewise.
* gcc.target/i386/pr80969-4b.c: Likewise.
* gcc.target/i386/pr80969-4.c: Likewise.


[PATCH] [i386,testsuite] [PR 71958] Error on -mx32 with -mabi=ms

2017-08-11 Thread Daniel Santos
We currently error when -mx32 -mabi=sysv and we encounter a function
with attribute ms_abi, but we are not erroring on -mx32 and -mabi=ms
(either explicitly or when it is the default on Windows).  In fact, it
generates code that runs, but is of an undfined ABI.

I'm running -m64 and -m32 tests now and will run x32 tests when those
are done.  Presuming that I've corrected all existing tests that do not
filter out x32 target and there are no additional failures, is this OK
for head?

Thanks,
Daniel

gcc/ChangeLog:
2017-08-11  Daniel Santos  

* config/i386/i386.c (ix86_option_override_internal): Modify.
(ix86_function_type_abi): Likewise.

gcc/testsuite/ChangeLog:
2017-08-11  Daniel Santos  

* gcc.target/i386/pr71958.c: New test.
* gcc.target/i386/pr64409.c: Modify to skip on Windows.
* gcc.target/i386/pr46470.c: Modify to skip x32 target.
* gcc.target/i386/pr66275.c: Likewise.
* gcc.target/i386/pr68018.c: Likewise.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c  | 11 +--
 gcc/testsuite/gcc.target/i386/pr46470.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr64409.c |  3 ++-
 gcc/testsuite/gcc.target/i386/pr66275.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr68018.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr71958.c |  8 
 6 files changed, 22 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr71958.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b04321a8d40..311a52c2a1f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5585,6 +5585,9 @@ ix86_option_override_internal (bool main_args_p,
 
   if (TARGET_X32_P (opts->x_ix86_isa_flags))
 {
+  if (opts_set->x_ix86_abi == MS_ABI)
+   error ("-mx32 not supported with -mabi=ms");
+
   /* Always turn on OPTION_MASK_ISA_64BIT and turn off
 OPTION_MASK_ABI_64 for TARGET_X32.  */
   opts->x_ix86_isa_flags |= OPTION_MASK_ISA_64BIT;
@@ -8777,8 +8780,12 @@ ix86_function_type_abi (const_tree fntype)
   if (abi == SYSV_ABI
   && lookup_attribute ("ms_abi", TYPE_ATTRIBUTES (fntype)))
 {
-  if (TARGET_X32)
-   error ("X32 does not support ms_abi attribute");
+  static int warned;
+  if (TARGET_X32 && !warned)
+   {
+ error ("X32 does not support ms_abi attribute");
+ warned = 1;
+   }
 
   abi = MS_ABI;
 }
diff --git a/gcc/testsuite/gcc.target/i386/pr46470.c 
b/gcc/testsuite/gcc.target/i386/pr46470.c
index 9e8e731188e..c66a378a1ad 100644
--- a/gcc/testsuite/gcc.target/i386/pr46470.c
+++ b/gcc/testsuite/gcc.target/i386/pr46470.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target { ! x32 } } } */
 /* The pic register save adds unavoidable stack pointer references.  */
 /* { dg-skip-if "" { ia32 && { ! nonpic } } } */
 /* These options are selected to ensure 1 word needs to be allocated
diff --git a/gcc/testsuite/gcc.target/i386/pr64409.c 
b/gcc/testsuite/gcc.target/i386/pr64409.c
index 917472653f4..3dbd9a09f01 100644
--- a/gcc/testsuite/gcc.target/i386/pr64409.c
+++ b/gcc/testsuite/gcc.target/i386/pr64409.c
@@ -1,6 +1,7 @@
 /* { dg-do compile { target { ! ia32 } } } */
 /* { dg-require-effective-target maybe_x32 } */
 /* { dg-options "-O0 -mx32" } */
+/* { xfail { "*-*-cygwin* *-*-mingw*" } } */
 
 int a;
-int* __attribute__ ((ms_abi)) fn1 () { return &a; } /* { dg-error "X32 does 
not support ms_abi attribute" } */
+int* __attribute__ ((ms_abi)) fn1 () { return &a; } /* { dg-error "X32 does 
not support ms_abi attribute" { target { ! "*-*-mingw* *-*-cygwin*" } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr66275.c 
b/gcc/testsuite/gcc.target/i386/pr66275.c
index b8759aeb5ec..a1271857f6a 100644
--- a/gcc/testsuite/gcc.target/i386/pr66275.c
+++ b/gcc/testsuite/gcc.target/i386/pr66275.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */
+/* { dg-do compile { target { *-*-linux* && { ! { ia32 || x32 } } } } } */
 /* { dg-options "-mabi=ms -fdump-rtl-dfinit" } */
 
 void
diff --git a/gcc/testsuite/gcc.target/i386/pr68018.c 
b/gcc/testsuite/gcc.target/i386/pr68018.c
index a0fa21e0b00..871fdddf643 100644
--- a/gcc/testsuite/gcc.target/i386/pr68018.c
+++ b/gcc/testsuite/gcc.target/i386/pr68018.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */
+/* { dg-do compile { target { *-*-linux* && { ! { ia32 || x32 } } } } } */
 /* { dg-options "-O -mabi=ms -mstackrealign" } */
 
 typedef float V __attribute__ ((vector_size (16)));
diff --git a/gcc/testsuite/gcc.target/i386/pr71958.c 
b/gcc/testsuite/gcc.target/i386/pr71958.c
new file mode 100644
index 000..090d1970ca9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr71958.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-mx32 -mabi=ms" } */
+/* { dg-require-effective-target maybe_x32 } */
+/* { dg-excess-errors "not supported" } */
+
+void main ()
+{
+}
-- 
2.13.3



[PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS

2017-08-21 Thread Daniel Santos
It took me a while to figure out how to do this so I figured that it should be
in the docs.  OK for trunk?

* doc/install.texi: Add more details on selecting multiple tests.

Thanks,
Daniel

Signed-off-by: Daniel Santos 
---
 gcc/doc/install.texi | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 7c9e2f25d44..6aefd213901 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -2737,6 +2737,16 @@ the testsuite with filenames matching @samp{9805*}, you 
would use
 make check-g++ RUNTESTFLAGS="old-deja.exp=9805* @var{other-options}"
 @end smallexample
 
+The file-matching expression following @var{filename}@command{.exp=} is treated
+as a series of whitespace-delimited glob expressions so that multiple patterns
+may be passed, although any whitespace must either be escaped or surrounded by
+tick marks if multiple expressions are desired. For example,
+
+@smallexample
+make check-g++ RUNTESTFLAGS="old-deja.exp=9805*\ virtual2.c 
@var{other-options}"
+make check-g++ RUNTESTFLAGS="'old-deja.exp=9805* virtual2.c' 
@var{other-options}"
+@end smallexample
+
 The @file{*.exp} files are located in the testsuite directories of the GCC
 source, the most important ones being @file{compile.exp},
 @file{execute.exp}, @file{dg.exp} and @file{old-deja.exp}.
-- 
2.13.3



[PATCH] [i386, testsuite] [PR 71958] Error on -mx32 with -mabi=ms

2017-08-21 Thread Daniel Santos
We currently error when -mx32 and -mabi=sysv and we encounter a function
with attribute ms_abi, but we are not erroring on -mx32 and -mabi=ms
(either explicitly or when it is the default on Windows).  In fact, it
generates code that runs, but is of an undfined ABI.

I'm also changing pr64409.c because if you explicitly supply -m64, then
the test became ineffective.  This is because the -mx32 parameter passed
in dg-options is later overridden by the explicit -m64 parameter.

I've bootstrapped and tested on
*  an x86_64-pc-linux-gnux32 system building gcc with --with-abi=mx32,
*  a "normal" x86_64-pc-linux-gnu testing with
   --target_board=unix/\{,-m32\}, and
*  on Windows.

OK for trunk?

gcc/ChangeLog:
2017-08-11  Daniel Santos  

* config/i386/i386.c (ix86_option_override_internal): Error when
-mx32 is combined with -mabi=ms.
(ix86_function_type_abi): Limit errors for mixing -mx32 with
attribute ms_abi.

gcc/testsuite/ChangeLog:
2017-08-11  Daniel Santos  

* gcc.target/i386/pr71958.c: New test to verify error on -mx32
and -mabi=ms
* gcc.target/i386/pr64409.c: Modify to only run on x32.
* gcc.target/i386/pr46470.c: Modify to skip x32 target.
* gcc.target/i386/pr66275.c: Likewise.
* gcc.target/i386/pr68018.c: Likewise.

Thanks,
Daniel

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c  | 12 ++--
 gcc/testsuite/gcc.target/i386/pr46470.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr64409.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr66275.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr68018.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr71958.c |  7 +++
 6 files changed, 21 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr71958.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 1d88e4f247a..3b537f2608f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5684,6 +5684,10 @@ ix86_option_override_internal (bool main_args_p,
   if (!opts_set->x_ix86_abi)
 opts->x_ix86_abi = DEFAULT_ABI;
 
+  if (opts->x_ix86_abi == MS_ABI && TARGET_X32_P (opts->x_ix86_isa_flags))
+error ("-mabi=ms not supported with X32 ABI");
+  gcc_assert (opts->x_ix86_abi == SYSV_ABI || opts->x_ix86_abi == MS_ABI);
+
   /* For targets using ms ABI enable ms-extensions, if not
  explicit turned off.  For non-ms ABI we turn off this
  option.  */
@@ -8777,8 +8781,12 @@ ix86_function_type_abi (const_tree fntype)
   if (abi == SYSV_ABI
   && lookup_attribute ("ms_abi", TYPE_ATTRIBUTES (fntype)))
 {
-  if (TARGET_X32)
-   error ("X32 does not support ms_abi attribute");
+  static int warned;
+  if (TARGET_X32 && !warned)
+   {
+ error ("X32 does not support ms_abi attribute");
+ warned = 1;
+   }
 
   abi = MS_ABI;
 }
diff --git a/gcc/testsuite/gcc.target/i386/pr46470.c 
b/gcc/testsuite/gcc.target/i386/pr46470.c
index 9e8e731188e..c66a378a1ad 100644
--- a/gcc/testsuite/gcc.target/i386/pr46470.c
+++ b/gcc/testsuite/gcc.target/i386/pr46470.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target { ! x32 } } } */
 /* The pic register save adds unavoidable stack pointer references.  */
 /* { dg-skip-if "" { ia32 && { ! nonpic } } } */
 /* These options are selected to ensure 1 word needs to be allocated
diff --git a/gcc/testsuite/gcc.target/i386/pr64409.c 
b/gcc/testsuite/gcc.target/i386/pr64409.c
index 917472653f4..7bf9d1e398d 100644
--- a/gcc/testsuite/gcc.target/i386/pr64409.c
+++ b/gcc/testsuite/gcc.target/i386/pr64409.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile { target x32 } } */
 /* { dg-require-effective-target maybe_x32 } */
 /* { dg-options "-O0 -mx32" } */
 
diff --git a/gcc/testsuite/gcc.target/i386/pr66275.c 
b/gcc/testsuite/gcc.target/i386/pr66275.c
index b8759aeb5ec..51ae1f6859c 100644
--- a/gcc/testsuite/gcc.target/i386/pr66275.c
+++ b/gcc/testsuite/gcc.target/i386/pr66275.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */
+/* { dg-do compile { target { *-*-linux* && lp64 } } } */
 /* { dg-options "-mabi=ms -fdump-rtl-dfinit" } */
 
 void
diff --git a/gcc/testsuite/gcc.target/i386/pr68018.c 
b/gcc/testsuite/gcc.target/i386/pr68018.c
index a0fa21e0b00..04929c6c13c 100644
--- a/gcc/testsuite/gcc.target/i386/pr68018.c
+++ b/gcc/testsuite/gcc.target/i386/pr68018.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */
+/* { dg-do compile { target { *-*-linux* && lp64 } } } */
 /* { dg-options "-O -mabi=ms -mstackrealign" } */
 
 typedef float V __attribute__ ((vector_size (16)));
diff --git a/gcc/testsuite/gcc.target/i386/pr71958.c 
b/gcc/testsuite/gcc.target/i386/pr71958.c
new file mode 100644
index 00

[PATCH] [i386] PR 81850 Don't ignore -mabi=sysv on Cygwin/MinGW

2017-08-21 Thread Daniel Santos
This is a problem that occured because of this code in
ix86_option_override_internal:

  if (!opts_set->x_ix86_abi)
opts->x_ix86_abi = DEFAULT_ABI;

I tested this along with my other patches.  OK for trunk?

* config/i386/i386-opts.h (enum calling_abi): Modify so that no legal
values are equivalent to zero.

Thanks,
Daniel

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386-opts.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
index 542cd0f3d67..8c2b5380e49 100644
--- a/gcc/config/i386/i386-opts.h
+++ b/gcc/config/i386/i386-opts.h
@@ -44,8 +44,8 @@ last_alg
 /* Available call abi.  */
 enum calling_abi
 {
-  SYSV_ABI = 0,
-  MS_ABI = 1
+  SYSV_ABI = 1,
+  MS_ABI = 2
 };
 
 enum fpmath_unit
-- 
2.13.3



Re: [PATCH] [i386] PR 81850 Don't ignore -mabi=sysv on Cygwin/MinGW

2017-08-22 Thread Daniel Santos
On 08/22/2017 01:26 AM, Andreas Schwab wrote:
> On Aug 21 2017, Daniel Santos  wrote:
>
>> This is a problem that occured because of this code in
>> ix86_option_override_internal:
>>
>>   if (!opts_set->x_ix86_abi)
>> opts->x_ix86_abi = DEFAULT_ABI;
> Why is that a problem?  Note opts_set vs opts.

Just because the test !opts_set->x_ix86_abi will be true rather we
supplied no -mabi parameter or we supplied -mabi=sysv.

Daniel

> Andreas.



Re: [PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS

2017-08-22 Thread Daniel Santos
On 08/22/2017 10:58 AM, Martin Sebor wrote:
> On 08/21/2017 07:41 PM, Daniel Santos wrote:
>> It took me a while to figure out how to do this so I figured that it
>> should be
>> in the docs.  OK for trunk?
>>
>> * doc/install.texi: Add more details on selecting multiple tests.
>
> Thank you!  It had taken me some time to figure this out.
>
>> +The file-matching expression following @var{filename}@command{.exp=}
>> is treated
>> +as a series of whitespace-delimited glob expressions so that
>> multiple patterns
>> +may be passed, although any whitespace must either be escaped or
>> surrounded by
>> +tick marks if multiple expressions are desired. For example,
>
> Do you mean single quotes?

Yes.  I guess I've heard the terms "tick marks" and "single quotes" used
before.  Perhaps using 'single quotes' would be a good way to express it
(with the quotes).

>   I would suggest "escaped or quoted."
> The whole argument to RUNTESTFLAGS can be quoted in either single
> or double quotes and, AFAICT, so can the space-separated test
> names within it.

Well, mysteriously, double quotes do not work.  So if I pass
RUNTESTFLAGS='"i386.exp=pr80969-[12]*.c pr80969-4.c"' then the second
pattern isn't used.  I have NO idea what happens to it because it I pass
RUNTESTFLAGS='i386.exp=pr80969-[12]*.c pr80969-4.c' then runtest
properly demands that I tell it what in the hell pr80969-4.c is supposed
to mean.  As an experiment, I created a symlink named \"pr80969-4.c and
using RUNTESTFLAGS='"i386.exp=pr80969-[12]*.c "pr80969-4.c' but it
didn't pick it up.  This is probably JAB (just another bug) in DejaGNU.

Among the variations I've tried are enclosing the expressions in
{braces},  \{escaped braces\} and comma-delimited \{escaped,braces\},
but none of these worked.

Daniel

> Martin
>



[PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS

2017-08-22 Thread Daniel Santos
OK, how's this one?

* doc/install.texi: Modify to add more details on running
selected tests.

Thanks,
Daniel

Signed-off-by: Daniel Santos 
---
 gcc/doc/install.texi | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 7c9e2f25d44..da360da1c50 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -2737,6 +2737,16 @@ the testsuite with filenames matching @samp{9805*}, you 
would use
 make check-g++ RUNTESTFLAGS="old-deja.exp=9805* @var{other-options}"
 @end smallexample
 
+The file-matching expression following @var{filename}@command{.exp=} is treated
+as a series of whitespace-delimited glob expressions so that multiple patterns
+may be passed, although any whitespace must either be escaped or surrounded by
+single quotes if multiple expressions are desired. For example,
+
+@smallexample
+make check-g++ RUNTESTFLAGS="old-deja.exp=9805*\ virtual2.c 
@var{other-options}"
+make check-g++ RUNTESTFLAGS="'old-deja.exp=9805* virtual2.c' 
@var{other-options}"
+@end smallexample
+
 The @file{*.exp} files are located in the testsuite directories of the GCC
 source, the most important ones being @file{compile.exp},
 @file{execute.exp}, @file{dg.exp} and @file{old-deja.exp}.
-- 
2.13.3



Re: [PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS

2017-08-22 Thread Daniel Santos
On 08/22/2017 12:32 PM, Mike Stump wrote:
> On Aug 22, 2017, at 10:32 AM, Daniel Santos  wrote:
>>>  I would suggest "escaped or quoted."
>>> The whole argument to RUNTESTFLAGS can be quoted in either single
>>> or double quotes and, AFAICT, so can the space-separated test
>>> names within it.
>> Well, mysteriously, double quotes do not work.
> Did you try the obvious:
>
> "\"pdf pdf\" pdf"
>
> ?  I think it should work fine.

Yes.  As I explained in the rest of my email I tried a great many
variations.  I can debug runtest some more and try to better understand
how this is getting parsed.

Daniel


Re: [PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS

2017-08-22 Thread Daniel Santos
On 08/22/2017 12:32 PM, Mike Stump wrote:
> On Aug 22, 2017, at 10:32 AM, Daniel Santos  wrote:
>>>  I would suggest "escaped or quoted."
>>> The whole argument to RUNTESTFLAGS can be quoted in either single
>>> or double quotes and, AFAICT, so can the space-separated test
>>> names within it.
>> Well, mysteriously, double quotes do not work.
> Did you try the obvious:
>
> "\"pdf pdf\" pdf"
>
> ?  I think it should work fine.

I have found one additional working mechanism:

RUNTESTFLAGS='i386.exp=\"pr80969-[12]*.c pr80969-4.c\"'

But using double quotes for both does NOT work:

RUNTESTFLAGS="i386.exp=\"pr80969-[12]*.c pr80969-4.c\""

So the three working options appears to be:
1. Escaping whitespace
2. Using double quotes for the whole value and single quotes for the
file.exp=patterns expression
3. Using single quotes for the whole value and double quotes for the
file.exp=patterns expression

Daniel


Re: [PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS

2017-08-22 Thread Daniel Santos
OK, the problem is at line 4014 of gcc/Makefile.in:

  $(MAKE) TESTSUITEDIR="$(TESTSUITEDIR)"
RUNTESTFLAGS="$(RUNTESTFLAGS)" \
check-parallel-$* \
 
Even worse, one can inject arbitrary shell commands here, not that I can
think of a scenario where it would be an actual security problem:

RUNTESTFLAGS="i386.exp=a b\"; beep\"" check-c

I presume that the solution would be to re-escape the contents of
RUNTESTFLAGS.

Daniel


[PATCH] [i386] PR 81850 Don't ignore -mabi=sysv on Cygwin/MinGW

2017-08-22 Thread Daniel Santos
> Please add UNKNOWN_ABI to the enum and initialize -mabi in i386.opt to
> UNKNOWN_ABI.

It would seem to me that UNSPECIFIED_ABI would be a better value name.

Also, I don't really understand what opts_set and opts are, except that I had
guessed opts_set is what the user asked for (or didn't ask for) and opts is
what we're going to actually use.  Am I close?

I'm re-running tests, so if they pass is this OK?

Thanks,
Daniel
---
 gcc/config/i386/i386-opts.h | 5 +++--
 gcc/config/i386/i386.c  | 3 +--
 gcc/config/i386/i386.opt| 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
index 542cd0f3d67..a1d1552a3c6 100644
--- a/gcc/config/i386/i386-opts.h
+++ b/gcc/config/i386/i386-opts.h
@@ -44,8 +44,9 @@ last_alg
 /* Available call abi.  */
 enum calling_abi
 {
-  SYSV_ABI = 0,
-  MS_ABI = 1
+  UNSPECIFIED_ABI = 0,
+  SYSV_ABI = 1,
+  MS_ABI = 2
 };
 
 enum fpmath_unit
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 650bcbc65ae..c08ad55fcd9 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5681,12 +5681,11 @@ ix86_option_override_internal (bool main_args_p,
 opts->x_ix86_pmode = TARGET_LP64_P (opts->x_ix86_isa_flags)
 ? PMODE_DI : PMODE_SI;
 
-  if (!opts_set->x_ix86_abi)
+  if (opts_set->x_ix86_abi == UNSPECIFIED_ABI)
 opts->x_ix86_abi = DEFAULT_ABI;
 
   if (opts->x_ix86_abi == MS_ABI && TARGET_X32_P (opts->x_ix86_isa_flags))
 error ("-mabi=ms not supported with X32 ABI");
-  gcc_assert (opts->x_ix86_abi == SYSV_ABI || opts->x_ix86_abi == MS_ABI);
 
   /* For targets using ms ABI enable ms-extensions, if not
  explicit turned off.  For non-ms ABI we turn off this
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index cd564315f04..f7b9f9707f7 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -525,7 +525,7 @@ Target Report Mask(IAMCU)
 Generate code that conforms to Intel MCU psABI.
 
 mabi=
-Target RejectNegative Joined Var(ix86_abi) Enum(calling_abi) Init(SYSV_ABI)
+Target RejectNegative Joined Var(ix86_abi) Enum(calling_abi) 
Init(UNSPECIFIED_ABI)
 Generate code that conforms to the given ABI.
 
 Enum
-- 
2.13.3



[PATCH v4 0/4] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f

2017-08-22 Thread Daniel Santos
I had to fix a few things for x32 compatibility and I this is ready
now.  H.J. tested on machine with avx512 (including x32) and I've tested
both native x32 and normal x86_64 with m64, m32 and mx32 and all is
well.  I've made more changes to the tests so I'm just submitting a
version 2 of the whole patch set.

OK for trunk?

2017-08-22  Daniel Santos  

* config/i386/i386.h (ix86_frame::stack_realign_allocate_offset):
Remove field.
(ix86_frame::stack_realign_allocate): New field.
(struct machine_frame_state): Modify comments.
(machine_frame_state::sp_realigned_fp_end): New field.
* config/i386/i386.c (ix86_compute_frame_layout): Rework stack frame
layout calculation.
(sp_valid_at): Add assertion to assure no attempt to access invalid
offset of a realigned stack.
(fp_valid_at): Likewise.
(choose_baseaddr): Modify comments.
(ix86_emit_outlined_ms2sysv_save): Adjust to changes in
ix86_expand_prologue.
(ix86_expand_prologue): Modify stack realignment and allocation.
(ix86_expand_epilogue): Modify comments.

2017-08-22  Daniel Santos  

* gcc.target/i386/pr80969-1.c: New testcase.
* gcc.target/i386/pr80969-2a.c: Likewise.
* gcc.target/i386/pr80969-2.c: Likewise.
* gcc.target/i386/pr80969-3.c: Likewise.
* gcc.target/i386/pr80969-4a.c: Likewise.
* gcc.target/i386/pr80969-4b.c: Likewise.
* gcc.target/i386/pr80969-4.c: Likewise.
* gcc.target/i386/pr80969-4.h: New header common to pr80969-4*.c


Thanks,
Daniel


[PATCH 1/4] [i386] Correct comments, add assertions to sp_valid_at and fp_valid_at

2017-08-22 Thread Daniel Santos
When we realign the stack frame (without DRAP), there may be a range of
CFA offsets that should never be touched because they are alignment
padding and any reference to them is almost certainly an error.
Previously, only the offset of where the realigned stack frame starts
was recorded and checked in sp_valid_at and fp_valid_at.

This change adds sp_realigned_fp_last to struct machine_frame_state to
record the last valid offset from which the frame pointer can be used
when the stack pointer is realigned and modifies sp_valid_at and
fp_valid_at to fail an assertion when passed an offset in the "no-man's
land" between these two values.

Comments for struct machine_frame_state incorrectly stated that a
realigned stack pointer could be used to access offsets equal to or
greater than sp_realigned_offset, but it is only valid for offsets that
are greater.  This was the (incorrect) behaviour of sp_valid_at and
fp_valid_at prior to r250587 and this change now corrects the
documentation and adds clarification of the CFA-relative calculation.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 45 ++---
 gcc/config/i386/i386.h | 18 +-
 2 files changed, 43 insertions(+), 20 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index c08ad55fcd9..601e3ef47f6 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13177,26 +13177,36 @@ choose_baseaddr_len (unsigned int regno, 
HOST_WIDE_INT offset)
   return len;
 }
 
-/* Determine if the stack pointer is valid for accessing the cfa_offset.
-   The register is saved at CFA - CFA_OFFSET.  */
+/* Determine if the stack pointer is valid for accessing the CFA_OFFSET in
+   the frame save area.  The register is saved at CFA - CFA_OFFSET.  */
 
-static inline bool
+static bool
 sp_valid_at (HOST_WIDE_INT cfa_offset)
 {
   const struct machine_frame_state &fs = cfun->machine->fs;
-  return fs.sp_valid && !(fs.sp_realigned
- && cfa_offset <= fs.sp_realigned_offset);
+  if (fs.sp_realigned && cfa_offset <= fs.sp_realigned_offset)
+{
+  /* Validate that the cfa_offset isn't in a "no-man's land".  */
+  gcc_assert (cfa_offset <= fs.sp_realigned_fp_last);
+  return false;
+}
+  return fs.sp_valid;
 }
 
-/* Determine if the frame pointer is valid for accessing the cfa_offset.
-   The register is saved at CFA - CFA_OFFSET.  */
+/* Determine if the frame pointer is valid for accessing the CFA_OFFSET in
+   the frame save area.  The register is saved at CFA - CFA_OFFSET.  */
 
 static inline bool
 fp_valid_at (HOST_WIDE_INT cfa_offset)
 {
   const struct machine_frame_state &fs = cfun->machine->fs;
-  return fs.fp_valid && !(fs.sp_valid && fs.sp_realigned
- && cfa_offset > fs.sp_realigned_offset);
+  if (fs.sp_realigned && cfa_offset > fs.sp_realigned_fp_last)
+{
+  /* Validate that the cfa_offset isn't in a "no-man's land".  */
+  gcc_assert (cfa_offset >= fs.sp_realigned_offset);
+  return false;
+}
+  return fs.fp_valid;
 }
 
 /* Choose a base register based upon alignment requested, speed and/or
@@ -14675,6 +14685,9 @@ ix86_expand_prologue (void)
   int align_bytes = crtl->stack_alignment_needed / BITS_PER_UNIT;
   gcc_assert (align_bytes > MIN_STACK_BOUNDARY / BITS_PER_UNIT);
 
+  /* Record last valid frame pointer offset.  */
+  m->fs.sp_realigned_fp_last = m->fs.sp_offset;
+
   /* The computation of the size of the re-aligned stack frame means
 that we must allocate the size of the register save area before
 performing the actual alignment.  Otherwise we cannot guarantee
@@ -14688,13 +14701,15 @@ ix86_expand_prologue (void)
   insn = emit_insn (ix86_gen_andsp (stack_pointer_rtx,
stack_pointer_rtx,
GEN_INT (-align_bytes)));
-  /* For the purposes of register save area addressing, the stack
-pointer can no longer be used to access anything in the frame
-below m->fs.sp_realigned_offset and the frame pointer cannot be
-used for anything at or above.  */
   m->fs.sp_offset = ROUND_UP (m->fs.sp_offset, align_bytes);
   m->fs.sp_realigned = true;
   m->fs.sp_realigned_offset = m->fs.sp_offset - frame.nsseregs * 16;
+  /* The stack pointer may no longer be equal to CFA - m->fs.sp_offset.
+Beyond this point, stack access should be done via choose_baseaddr or
+by using sp_valid_at and fp_valid_at to determine the correct base
+register.  Henceforth, any CFA offset should be thought of as logical
+and not physical.  */
+  gcc_assert (m->fs.sp_realigned_offset >= m->fs.sp_realigned_fp_last);
   gcc_assert (m->fs.

[PATCH 2/4] [i386] Modify ix86_compute_frame_layout

2017-08-22 Thread Daniel Santos
These changes affect how the stack frame is calculated from the region
starting at frame.reg_save_offset until frame.frame_pointer_offset,
which includes either the stub save area or the (inline) SSE register
save area and the va_args register save area.

The calculation used when not realigning the stack pointer is the same,
but when when realigning we calculate the 16-byte aligned space needed
in reverse so that the stack realignment boundary at
frame.stack_realign_offset may not necessarily be a multiple of
stack_alignment_needed, but the value of frame.frame_pointer_offset
will. This results in a properly aligned stack for the function body and
avoids wasting stack space.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 116 +
 gcc/config/i386/i386.h |   2 +-
 2 files changed, 80 insertions(+), 38 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 601e3ef47f6..30e84dd5303 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12960,6 +12960,14 @@ ix86_compute_frame_layout (void)
   gcc_assert (preferred_alignment >= STACK_BOUNDARY / BITS_PER_UNIT);
   gcc_assert (preferred_alignment <= stack_alignment_needed);
 
+  /* The only ABI saving SSE regs should be 64-bit ms_abi.  */
+  gcc_assert (TARGET_64BIT || !frame->nsseregs);
+  if (TARGET_64BIT && m->call_ms2sysv)
+{
+  gcc_assert (stack_alignment_needed >= 16);
+  gcc_assert (!frame->nsseregs);
+}
+
   /* For SEH we have to limit the amount of code movement into the prologue.
  At present we do this via a BLOCKAGE, at which point there's very little
  scheduling that can be done, which means that there's very little point
@@ -13022,54 +13030,88 @@ ix86_compute_frame_layout (void)
   if (TARGET_SEH)
 frame->hard_frame_pointer_offset = offset;
 
-  /* When re-aligning the stack frame, but not saving SSE registers, this
- is the offset we want adjust the stack pointer to.  */
-  frame->stack_realign_allocate_offset = offset;
+  /* Calculate the size of the va-arg area (not including padding, if any).  */
+  frame->va_arg_size = ix86_varargs_gpr_size + ix86_varargs_fpr_size;
 
-  /* The re-aligned stack starts here.  Values before this point are not
- directly comparable with values below this point.  Use sp_valid_at
- to determine if the stack pointer is valid for a given offset and
- fp_valid_at for the frame pointer.  */
   if (stack_realign_fp)
-offset = ROUND_UP (offset, stack_alignment_needed);
-  frame->stack_realign_offset = offset;
-
-  if (TARGET_64BIT && m->call_ms2sysv)
 {
-  gcc_assert (stack_alignment_needed >= 16);
-  gcc_assert (!frame->nsseregs);
+  /* We may need a 16-byte aligned stack for the remainder of the
+register save area, but the stack frame for the local function
+may require a greater alignment if using AVX/2/512.  In order
+to avoid wasting space, we first calculate the space needed for
+the rest of the register saves, add that to the stack pointer,
+and then realign the stack to the boundary of the start of the
+frame for the local function.  */
+  HOST_WIDE_INT space_needed = 0;
+  HOST_WIDE_INT sse_reg_space_needed = 0;
 
-  m->call_ms2sysv_pad_in = !!(offset & UNITS_PER_WORD);
-  offset += xlogue_layout::get_instance ().get_stack_space_used ();
-}
+  if (TARGET_64BIT)
+   {
+ if (m->call_ms2sysv)
+   {
+ m->call_ms2sysv_pad_in = 0;
+ space_needed = xlogue_layout::get_instance 
().get_stack_space_used ();
+   }
 
-  /* Align and set SSE register save area.  */
-  else if (frame->nsseregs)
-{
-  /* The only ABI that has saved SSE registers (Win64) also has a
-16-byte aligned default stack.  However, many programs violate
-the ABI, and Wine64 forces stack realignment to compensate.
+ else if (frame->nsseregs)
+   /* The only ABI that has saved SSE registers (Win64) also has a
+  16-byte aligned default stack.  However, many programs violate
+  the ABI, and Wine64 forces stack realignment to compensate.  */
+   space_needed = frame->nsseregs * 16;
+
+ sse_reg_space_needed = space_needed = ROUND_UP (space_needed, 16);
+
+ /* 64-bit frame->va_arg_size should always be a multiple of 16, but
+rounding to be pedantic.  */
+ space_needed = ROUND_UP (space_needed + frame->va_arg_size, 16);
+   }
+  else
+   space_needed = frame->va_arg_size;
+
+  /* Record the allocation size required prior to the realignment AND.  */
+  frame->stack_realign_allocate = space_needed;
+
+  /* The re-aligned stack starts at frame->stack_realign_offset.  Values
+before this point are not directly comparable with values below

[PATCH 3/4] [i386] Modify SP realignment in ix86_expand_prologue, et. al.

2017-08-22 Thread Daniel Santos
My first version of this patch inited m->fs.sp_realigned_fp_last with
the value of m->fs.sp_offset prior to performing the stack realignment.
I had forgotten, however, that when we're saving GP regs using MOV that
we delay SP modification as long as possible so that the value of
m->fs.sp_offset at this point is correct when we've used push, but
incorrect when we've used mov.

This has been tested on both x86_64-pc-linux-gnu{,x32} with
--target_board=unix/\{-m64,-mx32,-m32\}.

Original patch description:

The SP allocation calculation is now done in ix86_compute_frame_layout
and the result stored in ix86_frame::stack_realign_allocate.  This
change also updates comments for choose_baseaddr to clarify that the
alignment returned doesn't necessarily reflect the alignment of the
cfa_offset passed (e.g., you can pass cfa_offset 48 and it can return an
alignment of 64 bytes).

Since the alignment required may be more than 16-bytes, we cannot defer
SP allocation to ix86_emit_outlined_ms2sysv_save (when it's enabled), so
that function needs to be updated as well.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 58 --
 1 file changed, 32 insertions(+), 26 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 30e84dd5303..dbc771da8aa 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13359,10 +13359,13 @@ choose_basereg (HOST_WIDE_INT cfa_offset, rtx 
&base_reg,
 }
 
 /* Return an RTX that points to CFA_OFFSET within the stack frame and
-   the alignment of address.  If align is non-null, it should point to
+   the alignment of address.  If ALIGN is non-null, it should point to
an alignment value (in bits) that is preferred or zero and will
-   recieve the alignment of the base register that was selected.  The
-   valid base registers are taken from CFUN->MACHINE->FS.  */
+   recieve the alignment of the base register that was selected,
+   irrespective of rather or not CFA_OFFSET is a multiple of that
+   alignment value.
+
+   The valid base registers are taken from CFUN->MACHINE->FS.  */
 
 static rtx
 choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align)
@@ -14445,35 +14448,35 @@ ix86_emit_outlined_ms2sysv_save (const struct 
ix86_frame &frame)
   rtx sym, addr;
   rtx rax = gen_rtx_REG (word_mode, AX_REG);
   const struct xlogue_layout &xlogue = xlogue_layout::get_instance ();
-  HOST_WIDE_INT rax_offset = xlogue.get_stub_ptr_offset () + m->fs.sp_offset;
-  HOST_WIDE_INT stack_alloc_size = frame.stack_pointer_offset - 
m->fs.sp_offset;
-  HOST_WIDE_INT stack_align_off_in = xlogue.get_stack_align_off_in ();
+  HOST_WIDE_INT allocate = frame.stack_pointer_offset - m->fs.sp_offset;
+
+  /* AL should only be live with sysv_abi.  */
+  gcc_assert (!ix86_eax_live_at_start_p ());
+
+  /* Setup RAX as the stub's base pointer.  We use stack_realign_offset rather
+ we've actually realigned the stack or not.  */
+  align = GET_MODE_ALIGNMENT (V4SFmode);
+  addr = choose_baseaddr (frame.stack_realign_offset
+ + xlogue.get_stub_ptr_offset (), &align);
+  gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode));
+  emit_insn (gen_rtx_SET (rax, addr));
 
-  /* Verify that the incoming stack 16-byte alignment offset matches the
- layout we're using.  */
-  gcc_assert (stack_align_off_in == (m->fs.sp_offset & UNITS_PER_WORD));
+  /* Allocate stack if not already done.  */
+  if (allocate > 0)
+  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
+   GEN_INT (-allocate), -1, false);
 
   /* Get the stub symbol.  */
   sym = xlogue.get_stub_rtx (frame_pointer_needed ? XLOGUE_STUB_SAVE_HFP
  : XLOGUE_STUB_SAVE);
   RTVEC_ELT (v, vi++) = gen_rtx_USE (VOIDmode, sym);
 
-  /* Setup RAX as the stub's base pointer.  */
-  align = GET_MODE_ALIGNMENT (V4SFmode);
-  addr = choose_baseaddr (rax_offset, &align);
-  gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode));
-  insn = emit_insn (gen_rtx_SET (rax, addr));
-
-  gcc_assert (stack_alloc_size >= xlogue.get_stack_space_used ());
-  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
-GEN_INT (-stack_alloc_size), -1,
-m->fs.cfa_reg == stack_pointer_rtx);
   for (i = 0; i < ncregs; ++i)
 {
   const xlogue_layout::reginfo &r = xlogue.get_reginfo (i);
   rtx reg = gen_rtx_REG ((SSE_REGNO_P (r.regno) ? V4SFmode : word_mode),
 r.regno);
-  RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);;
+  RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);
 }
 
   gcc_assert (vi == (unsigned)GET_NUM_ELEM (v));
@@ -14728,14 +14731,15 @@ ix86_expand_prologue (void)
   gcc_assert (align_bytes > MIN

[PATCH 4/4] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available

2017-08-22 Thread Daniel Santos
Changes to lib/target-supports.exp and documentation:
* Add effective-targets avx512f and avx512f_runtime (needed for new
  tests).
* Corrects bug in check_avx2_hw_available.
* Adds documentation for effective-targets avx2, avx2_runtime (both
  missing), avx512f and avx512f_runtime.

The following tests are added.  The testcase in the PR is used as a base
and relevant variants are added to test other factors affected by the
patch set.

pr80969-1.c   Base test case.
pr80969-2.c   With ms to sysv call.
pr80969-2a.c  With ms to sysv call using stubs.
pr80969-3.c   With alloca (for DRAP test).
pr80969-4.c   With va_args passed via va_list
pr80969-4a.c  With va_args passed via va_list and ms to sysv call.
pr80969-4b.c  With va_args passed via va_list and ms to sysv call using
  stubs.
pr80969-4.h   Common header for pr80969-4*.c.

Signed-off-by: Daniel Santos 
---
 gcc/doc/sourcebuild.texi   |  12 +++
 gcc/testsuite/gcc.target/i386/pr80969-1.c  |  16 
 gcc/testsuite/gcc.target/i386/pr80969-2.c  |  27 +++
 gcc/testsuite/gcc.target/i386/pr80969-2a.c |   8 ++
 gcc/testsuite/gcc.target/i386/pr80969-3.c  |  32 
 gcc/testsuite/gcc.target/i386/pr80969-4.c  |   9 +++
 gcc/testsuite/gcc.target/i386/pr80969-4.h  | 119 +
 gcc/testsuite/gcc.target/i386/pr80969-4a.c |   9 +++
 gcc/testsuite/gcc.target/i386/pr80969-4b.c |   9 +++
 gcc/testsuite/lib/target-supports.exp  |  66 
 10 files changed, 307 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4.h
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4b.c

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index e6313dc031e..0bf4d6afeb6 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1855,6 +1855,18 @@ Target supports compiling @code{avx} instructions.
 @item avx_runtime
 Target supports the execution of @code{avx} instructions.
 
+@item avx2
+Target supports compiling @code{avx2} instructions.
+
+@item avx2_runtime
+Target supports the execution of @code{avx2} instructions.
+
+@item avx512f
+Target supports compiling @code{avx512f} instructions.
+
+@item avx512f_runtime
+Target supports the execution of @code{avx512f} instructions.
+
 @item cell_hw
 Test system can execute AltiVec and Cell PPU instructions.
 
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-1.c 
b/gcc/testsuite/gcc.target/i386/pr80969-1.c
new file mode 100644
index 000..e0520b45c40
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-1.c
@@ -0,0 +1,16 @@
+/* { dg-do run { target { ! x32 } } } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+int a[56];
+int b;
+int main (int argc, char *argv[]) {
+  int c;
+  for (; b; b++) {
+c = b;
+if (b & 1)
+  c = 2;
+a[b] = c;
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2.c 
b/gcc/testsuite/gcc.target/i386/pr80969-2.c
new file mode 100644
index 000..f885dee6512
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-2.c
@@ -0,0 +1,27 @@
+/* { dg-do run { target { { ! x32 } && avx512f_runtime } } } */
+/* { dg-do compile { target { { ! x32 } && { ! avx512f_runtime } } } } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test when calling a sysv func.  */
+
+int a[56];
+int b;
+
+static void __attribute__((sysv_abi)) sysv ()
+{
+}
+
+void __attribute__((sysv_abi)) (*volatile const sysv_noinfo)() = sysv;
+
+int main (int argc, char *argv[]) {
+  int c;
+  sysv_noinfo ();
+  for (; b; b++) {
+c = b;
+if (b & 1)
+  c = 2;
+a[b] = c;
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2a.c 
b/gcc/testsuite/gcc.target/i386/pr80969-2a.c
new file mode 100644
index 000..baea0796d24
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-2a.c
@@ -0,0 +1,8 @@
+/* { dg-do run { target { lp64 && avx512f_runtime } } } */
+/* { dg-do compile { target { lp64 && { ! avx512f_runtime } } } } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f -mcall-ms2sysv-xlogues" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test when calling a sysv func using save/restore stubs.  */
+
+#include "pr80969-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-3.c 
b/gcc/testsuite/gcc.target/i386/pr80969-3.c
new file mode 100644
index 000..d902a771cc8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-3.c
@@ -0,0 +1,32 @@
+/* { dg-do run { targe

Re: [PATCH] [i386] PR 81850 Don't ignore -mabi=sysv on Cygwin/MinGW

2017-08-22 Thread Daniel Santos
On 08/22/2017 03:00 PM, Uros Bizjak wrote:
> On Tue, Aug 22, 2017 at 9:47 PM, Daniel Santos  
> wrote:
>>> Please add UNKNOWN_ABI to the enum and initialize -mabi in i386.opt to
>>> UNKNOWN_ABI.
>> It would seem to me that UNSPECIFIED_ABI would be a better value name.
>>
>> Also, I don't really understand what opts_set and opts are, except that I had
>> guessed opts_set is what the user asked for (or didn't ask for) and opts is
>> what we're going to actually use.  Am I close?
> Yes. opts_set is a flag that user specified an option at the command line.
>
> However, I fail to see what is the problem. If nothing was specified,
> then opts->x_ix86_abi is set to DEFAULT_ABI.

That is not what is happening.  If -mabi=sysv is specified, then the
test (!opts_set->x_ix86_abi) is true since the value of SYSV_ABI is
zero.  When that is evaluated as true, then the abi is set to
DEFAULT_ABI, which on Windows is MS_ABI, thus ignoring the command line
option.

> Probably we don't need
> Init(SYSV_ABI) in mabi= declaration at all.

I'm guessing that if we don't specify an Init() option then it will
default to zero?  We just need a valid way to differentiate when
-mabi=sysv has been passed from when nothing has been passed.

Daniel

>
> Uros.
>
>> I'm re-running tests, so if they pass is this OK?
>>
>> Thanks,
>> Daniel
>> ---
>>  gcc/config/i386/i386-opts.h | 5 +++--
>>  gcc/config/i386/i386.c  | 3 +--
>>  gcc/config/i386/i386.opt| 2 +-
>>  3 files changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
>> index 542cd0f3d67..a1d1552a3c6 100644
>> --- a/gcc/config/i386/i386-opts.h
>> +++ b/gcc/config/i386/i386-opts.h
>> @@ -44,8 +44,9 @@ last_alg
>>  /* Available call abi.  */
>>  enum calling_abi
>>  {
>> -  SYSV_ABI = 0,
>> -  MS_ABI = 1
>> +  UNSPECIFIED_ABI = 0,
>> +  SYSV_ABI = 1,
>> +  MS_ABI = 2
>>  };
>>
>>  enum fpmath_unit
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index 650bcbc65ae..c08ad55fcd9 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -5681,12 +5681,11 @@ ix86_option_override_internal (bool main_args_p,
>>  opts->x_ix86_pmode = TARGET_LP64_P (opts->x_ix86_isa_flags)
>>  ? PMODE_DI : PMODE_SI;
>>
>> -  if (!opts_set->x_ix86_abi)
>> +  if (opts_set->x_ix86_abi == UNSPECIFIED_ABI)
>>  opts->x_ix86_abi = DEFAULT_ABI;
>>
>>if (opts->x_ix86_abi == MS_ABI && TARGET_X32_P (opts->x_ix86_isa_flags))
>>  error ("-mabi=ms not supported with X32 ABI");
>> -  gcc_assert (opts->x_ix86_abi == SYSV_ABI || opts->x_ix86_abi == MS_ABI);
>>
>>/* For targets using ms ABI enable ms-extensions, if not
>>   explicit turned off.  For non-ms ABI we turn off this
>> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
>> index cd564315f04..f7b9f9707f7 100644
>> --- a/gcc/config/i386/i386.opt
>> +++ b/gcc/config/i386/i386.opt
>> @@ -525,7 +525,7 @@ Target Report Mask(IAMCU)
>>  Generate code that conforms to Intel MCU psABI.
>>
>>  mabi=
>> -Target RejectNegative Joined Var(ix86_abi) Enum(calling_abi) Init(SYSV_ABI)
>> +Target RejectNegative Joined Var(ix86_abi) Enum(calling_abi) 
>> Init(UNSPECIFIED_ABI)
>>  Generate code that conforms to the given ABI.
>>
>>  Enum
>> --
>> 2.13.3
>>



Re: [PATCH 4/4] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available

2017-08-23 Thread Daniel Santos

On 08/23/2017 08:26 AM, Uros Bizjak wrote:

>> @@ -1822,6 +1845,7 @@ proc check_avx2_hw_available { } {
>> expr 0
>> } else {
>> check_runtime_nocache avx2_hw_available {
>> +   #include 
> Why is the above include needed?

It is only needed to #define NULL.  Without the include, I've had this
function fail due to NULL being undefined.

Daniel



Re: [PATCH] [i386] PR 81850 Don't ignore -mabi=sysv on Cygwin/MinGW

2017-08-23 Thread Daniel Santos
On 08/23/2017 01:12 AM, Uros Bizjak wrote:
> On Wed, Aug 23, 2017 at 7:23 AM, Daniel Santos  
> wrote:
>> On 08/22/2017 03:00 PM, Uros Bizjak wrote:
>>> On Tue, Aug 22, 2017 at 9:47 PM, Daniel Santos  
>>> wrote:
>>>>> Please add UNKNOWN_ABI to the enum and initialize -mabi in i386.opt to
>>>>> UNKNOWN_ABI.
>>>> It would seem to me that UNSPECIFIED_ABI would be a better value name.
>>>>
>>>> Also, I don't really understand what opts_set and opts are, except that I 
>>>> had
>>>> guessed opts_set is what the user asked for (or didn't ask for) and opts is
>>>> what we're going to actually use.  Am I close?
>>> Yes. opts_set is a flag that user specified an option at the command line.
>>>
>>> However, I fail to see what is the problem. If nothing was specified,
>>> then opts->x_ix86_abi is set to DEFAULT_ABI.
>> That is not what is happening.  If -mabi=sysv is specified, then the
>> test (!opts_set->x_ix86_abi) is true since the value of SYSV_ABI is
>> zero.  When that is evaluated as true, then the abi is set to
>> DEFAULT_ABI, which on Windows is MS_ABI, thus ignoring the command line
>> option.
> Let's use the following patch:
>
> --cut here--
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 3c82ae64f4f2..f8590f663285 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -5682,7 +5682,7 @@ ix86_option_override_internal (bool main_args_p,
>  ? PMODE_DI : PMODE_SI;
>
>if (!opts_set->x_ix86_abi)
> -opts->x_ix86_abi = DEFAULT_ABI;
> +printf ("Using default ABI\n"), opts->x_ix86_abi = DEFAULT_ABI;
>
>/* For targets using ms ABI enable ms-extensions, if not
>   explicit turned off.  For non-ms ABI we turn off this
> --cut here--
>
> $ ./cc1 -O2 -quiet hello.c
> Using default ABI
> $ ./cc1 -O2 -mabi=sysv -quiet hello.c
> $
> $ ./cc1 -O2 -mabi=sysv -quiet hello.c
> $
>
> Again, opts_set is set to true when the option is specified on the
> command line, it has nothing to do with the value of the option.

Interesting, I get the same result and in fact I can't reproduce the bug
anymore.  Either I made a mistake somewhere (likely) or something else
fixed the problem (less likely).  I'll try again from where the trunk
was when I filed the bug and close it either invalid or fixed depending
upon which it is.

Thanks!
Daniel

>> I'm guessing that if we don't specify an Init() option then it will
>> default to zero?  We just need a valid way to differentiate when
>> -mabi=sysv has been passed from when nothing has been passed.
> Yes, it defaults to zero, but since we live in c++ world nowadays, we
> can't initialize enum with integer zero...
>
> Uros.




Re: [PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues

2017-05-06 Thread Daniel Santos

On 05/05/2017 03:56 AM, Daniel Santos wrote:

On 05/02/2017 05:40 AM, Kai Tietz wrote:

Right, and Wine people will tell, if something doesn't work for them.
So ok for me too.

Kai
Well, I haven't re-run these tests in a few months, but I got 272 
failed wine tests with gcc 7.1 and 234 with my patch set rebased onto 
7.1.  So it looks like I'll be trying to diagnose these failures this 
weekend.


Those are bad numbers.  I had forgotten to filter out the testlist.o 
files.  Below are my most recent numbers running Wine 2.7:


gcc-5.4.0 CFLAGS="-march=native -O2 -g": 74
gcc-7.1.0 CFLAGS="-march=native -O2 -g": 74
gcc-7.1.0 CFLAGS="-march=nocona -mtune=generic -O2 -g": 79
gcc-7.1.0 CFLAGS="-march=native -O2 -g -mcall-ms2sysv-xlogues" (patched): 31

I'm building out a clean test environment on another machine to try to 
rule out clutter issues (and video driver issues) on my workstation.


Daniel


Re: [PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues

2017-05-08 Thread Daniel Santos

On 05/06/2017 03:22 PM, Daniel Santos wrote:


gcc-5.4.0 CFLAGS="-march=native -O2 -g": 74
gcc-7.1.0 CFLAGS="-march=native -O2 -g": 74
gcc-7.1.0 CFLAGS="-march=nocona -mtune=generic -O2 -g": 79
gcc-7.1.0 CFLAGS="-march=native -O2 -g -mcall-ms2sysv-xlogues" 
(patched): 31


I'm building out a clean test environment on another machine to try to 
rule out clutter issues (and video driver issues) on my workstation.


Daniel



I've re-run Wine's tests with a new clean VM environment and some 
changes to include more tests and similar results:


Compiler Failures
gcc-4.9.4:   39
gcc-7.1.0:   78
gcc-7.1.0-patched (with -mcall-ms2sysv-xlogues): 40


The first error not present in the gcc-4.9.4 tests that I examined 
looked like a run-of-the-mill race condition in Wine that just happened 
to not crash when built with 4.9.4.  So I'm going to guess that the 
disappearance of these failures with -mcall-ms2sysv-xlogues is just 
incidental.  I think we're in good condition with this patch set.


Daniel


[PING] [PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues

2017-05-12 Thread Daniel Santos

Ping?  I have posted revisions of the following in patch set:

05/12 - https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01442.html
09/12 - https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00348.html
11/12 - https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00350.html

I have retested them on Linux x86-64 in addition a Wine testsuite 
comparison resulting in fewer failed tests (31) than when using 
unpatched 7.1.0 (78) and 5.4.0 (78).  A cursory examination of the now 
working failures with 7.1.0 seemed to be to be due to race conditions in 
Wine that are incidentally hidden after the patches.


Is there anything else needed before we can commit these?  They still 
rebase cleanly onto the HEAD, but I can repost as "v5" if you prefer.


Thanks,
Daniel



Re: [PING] [PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues

2017-05-13 Thread Daniel Santos

On 05/13/2017 11:52 AM, Uros Bizjak wrote:

On Sat, May 13, 2017 at 1:01 AM, Daniel Santos  wrote:

Ping?  I have posted revisions of the following in patch set:

05/12 - https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01442.html
09/12 - https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00348.html
11/12 - https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00350.html

I have retested them on Linux x86-64 in addition a Wine testsuite comparison
resulting in fewer failed tests (31) than when using unpatched 7.1.0 (78)
and 5.4.0 (78).  A cursory examination of the now working failures with
7.1.0 seemed to be to be due to race conditions in Wine that are
incidentally hidden after the patches.

Is there anything else needed before we can commit these?  They still rebase
cleanly onto the HEAD, but I can repost as "v5" if you prefer.

Please go ahead and commit the patches.

However, please stay around to fix possible fallout. As said - you are
touching quite complex part of the compiler ...

Thanks,
Uros.


Thanks!  I'll definitely be around, I have a lot more that I'm working 
on with C generics/pseudo-templates (all middle-end stuff). I also want 
to examine more ways that SSE saves/restores can be omitted in these ms 
to sysv calls through static analysis and such.


Anyway, I don't yet have SVN write access, will you sponsor my request?

Thanks,
Daniel



Re: [PATCH] [i386] Recompute the frame layout less often

2017-05-14 Thread Daniel Santos

On 05/14/2017 02:42 AM, Bernd Edlinger wrote:

Hi,


this patch uses the new TARGET_COMPUTE_FRAME_LAYOUT hook in the i386
backend to avoid re-computing the frame layout when not really
necessary.

It simplifies the logic in ix86_compute_frame_layout by removing
the use_fast_prologue_epilogue_nregs, which is no longer necessary,
because the frame layout can no longer change spontaneously.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.


I think Uros is about to commit my improvements to ms to sysv abi calls, 
which is a large change and will conflict with your patch. I've added 
several new fields to struct ix86_frame that will need to be merged (and 
moved to i386.h).  I believe that my only explicit check of 
crtl->stack_realign_finalized is during pro/epilogue expand, and not in 
ix86_compute_frame_layout.  A former incarnation of my patches needed 
ix86_compute_frame_layout to be called *after* it was set, but I believe 
that is no longer the case, and so shouldn't conflict, but retesting 
should certainly be done.


https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01338.html

Thanks,
Daniel


Re: [PATCH] [i386] Recompute the frame layout less often

2017-05-14 Thread Daniel Santos

On 05/14/2017 11:31 AM, Bernd Edlinger wrote:

Hi Daniel,

there is one thing I don't understand in your patch:
That is, it introduces a static value:

/* Registers who's save & restore will be managed by stubs called from
 pro/epilogue.  */
static HARD_REG_SET GTY(()) stub_managed_regs;

This seems to be set as a side effect of ix86_compute_frame_layout,
and depends on the register usage of the current function.
But values that depend on the current function need usually be
attached to cfun->machine, because the passes can run in parallel
unless I am completely mistaken, and the stub_managed_regs may
therefore be computed from a different function.


Bernd.


I'm relatively new to GCC and still learning.  However, there are quite 
a lot of static TU variables in i386.c like this.  I am not aware of gcc 
having parallelism support, but if it were to be added then all of these 
TU variables should probably be moved to some class or struct (like 
cfun->machine) to reduce the number of TLS lookups required (which I 
presume is a little more expensive than a this/offset calculation).  
Having this (as well as other variables) in such a struct is better 
design IMO, but as I said, I'm still learning GCC's architecture, idioms 
and patterns.  (I should add that I don't really understand the GTY 
memory management either. :)


To be clear on class xlogue_layout, the only instances of this class are 
const and could be shared across multiple threads.  It is dependent upon 
the cfun->machine as well as the global struct rtl_data crtl, but is not 
so entangled that were these proper C++ classes (with private data) that 
it would need to be a friend -- it only needs read-access to their data 
members.


To be honest, it's a strange feeling programming in a mixture of C and 
C++ idioms, but I know it was only recently converted to C++ so I think 
it's better to try to use only one or the other in a given function.  
But if I were going to do this all OO, then ix86_compute_frame_layout 
would be a member function of ix86_frame (which would be a 
specialization of some generic "frame" class), machine_function would be 
class ix86_machine_function with it's own compute_frame_layout that 
called ix86_frame::compute_frame_layout, etc.  If I really wanted to go 
nuts, I would consider making class function, et.al. template classes 
with machine_function and machine_function_state part of the object 
instead of pointers to separate objects to reduce accesses down to a 
single this/offset, but now I I'm *really* digressing...


Please free to move it.

Thanks,
Daniel


Re: [PATCH] [i386] Recompute the frame layout less often

2017-05-14 Thread Daniel Santos

On 05/14/2017 11:31 AM, Bernd Edlinger wrote:

Hi Daniel,

there is one thing I don't understand in your patch:
That is, it introduces a static value:

/* Registers who's save & restore will be managed by stubs called from
 pro/epilogue.  */
static HARD_REG_SET GTY(()) stub_managed_regs;

This seems to be set as a side effect of ix86_compute_frame_layout,
and depends on the register usage of the current function.
But values that depend on the current function need usually be
attached to cfun->machine, because the passes can run in parallel
unless I am completely mistaken, and the stub_managed_regs may
therefore be computed from a different function.


Bernd.


I should add that if you want to run faster tests just on the ms to sysv 
abi code, you can use make RUNTESTFLAGS="ms-sysv.exp" check and then if 
that succeeds run the full testsuite.


Daniel


Re: [PATCH] [i386] Recompute the frame layout less often

2017-05-15 Thread Daniel Santos

On 05/15/2017 03:39 PM, Bernd Edlinger wrote:

On 05/15/17 03:39, Daniel Santos wrote:

On 05/14/2017 11:31 AM, Bernd Edlinger wrote:

Hi Daniel,

there is one thing I don't understand in your patch:
That is, it introduces a static value:

/* Registers who's save & restore will be managed by stubs called from
  pro/epilogue.  */
static HARD_REG_SET GTY(()) stub_managed_regs;

This seems to be set as a side effect of ix86_compute_frame_layout,
and depends on the register usage of the current function.
But values that depend on the current function need usually be
attached to cfun->machine, because the passes can run in parallel
unless I am completely mistaken, and the stub_managed_regs may
therefore be computed from a different function.


Bernd.

I should add that if you want to run faster tests just on the ms to sysv
abi code, you can use make RUNTESTFLAGS="ms-sysv.exp" check and then if
that succeeds run the full testsuite.

Daniel

Unfortunately I encounter a serious problem when my patch is used
ontop of your patch, Yes, the test suite ran without error, but then
I tried to trigger the warning and that tripped an ICE.
The reason is that cfun->machine->call_ms2sysv can be set to true
*after* reload_completed, which can be seen using the following
patch:

Index: i386.c
===
--- i386.c  (revision 248031)
+++ i386.c  (working copy)
@@ -29320,7 +29320,10 @@

 /* Set here, but it may get cleared later.  */
 if (TARGET_CALL_MS2SYSV_XLOGUES)
+  {
+   gcc_assert(!reload_completed);
cfun->machine->call_ms2sysv = true;
+  }
   }

 if (vec_len > 1)


That assertion is triggered in this test case:

cat test.c
int test()
{
__builtin_printf("test\n");
return 0;
}

gcc -mabi=ms -mcall-ms2sysv-xlogues -fsplit-stack -c test.c
test.c: In function 'test':
test.c:5:1: internal compiler error: in ix86_expand_call, at
config/i386/i386.c:29324
   }
   ^
0x13390a4 ix86_expand_call(rtx_def*, rtx_def*, rtx_def*, rtx_def*,
rtx_def*, bool)
../../gcc-trunk/gcc/config/i386/i386.c:29324
0x1317494 ix86_expand_split_stack_prologue()
../../gcc-trunk/gcc/config/i386/i386.c:15920
0x162ba21 gen_split_stack_prologue()
../../gcc-trunk/gcc/config/i386/i386.md:12556
0x12f3f30 target_gen_split_stack_prologue
../../gcc-trunk/gcc/config/i386/i386.md:12325
0xb237b3 make_split_prologue_seq
../../gcc-trunk/gcc/function.c:5822
0xb23a08 thread_prologue_and_epilogue_insns()
../../gcc-trunk/gcc/function.c:5958
0xb24840 rest_of_handle_thread_prologue_and_epilogue
../../gcc-trunk/gcc/function.c:6428
0xb248c0 execute
../../gcc-trunk/gcc/function.c:6470
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.


so, in ix86_expand_split_stack_prologue
we first call:
ix86_finalize_stack_realign_flags ();
ix86_compute_frame_layout (&frame);

and later:
call_insn = ix86_expand_call (NULL_RTX, gen_rtx_MEM (QImode, fn),
  GEN_INT (UNITS_PER_WORD), constm1_rtx,
  pop, false);

which changes a flag with a huge impact on the frame layout, but there
is no absolutely no way how the frame layout can change once it is
finalized.


Any Thoughts?


Bernd.


Well, my intention was actually to punt on those cases, but I hadn't 
actually tested with -fsplit-stack.  It looks like 
ix86_expand_split_stack_prologue calls ix86_expand_call, and I hadn't 
anticipated it getting called after the last call to 
ix86_compute_frame_layout(), which your patch has probably eliminated.  
In the case of -fsplit-stack, I'm testing the macro flag_split_stack 
which (currently) just expands to check the global flag, so this could 
instead be done in ix86_option_override_internal () instead, but I think 
it highlights a somewhat deeper problem.


Rather or not m->call_ms2sysv is set determines which stack layout is 
used when ix86_compute_frame_layout() runs.  But if we can run 
expand_call after the final time ix86_compute_frame_layout() then we 
have a problem.  It looks like ix86_expand_split_stack_prologue is the 
only function that manually calls ix86_expand_call, but maybe it would 
be better to modify the test to something like this:


diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a78819d6b3f..c36383f6962 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -29325,7 +29325,7 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
}
 
   /* Set here, but it may get cleared later.  */

-  if (TARGET_CALL_MS2SYSV_XLOGUES)
+  if (TARGET_CALL_MS2SYSV_XLOGUES && !reload_completed)
cfun->machine->call_ms2sysv = true;
 }
 

Or eve

Re: [PATCH] [i386] Recompute the frame layout less often

2017-05-15 Thread Daniel Santos
Ian, would you mind looking at this please?  A combination of my 
-mcall-ms2sysv-xlogues patch with Bernd's patch is causing problems when 
ix86_expand_split_stack_prologue() calls ix86_expand_call().


On 05/15/2017 06:46 PM, Daniel Santos wrote:
Rather or not m->call_ms2sysv is set determines which stack layout is 
used when ix86_compute_frame_layout() runs. But if we can run 
expand_call after the final time ix86_compute_frame_layout() then we 
have a problem.  It looks like ix86_expand_split_stack_prologue is the 
only function that manually calls ix86_expand_call, but maybe it would 
be better to modify the test to something like this:


diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a78819d6b3f..c36383f6962 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -29325,7 +29325,7 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx 
callarg1,

}

   /* Set here, but it may get cleared later.  */
-  if (TARGET_CALL_MS2SYSV_XLOGUES)
+  if (TARGET_CALL_MS2SYSV_XLOGUES && !reload_completed)
cfun->machine->call_ms2sysv = true;
 }




Actually, I think this is wrong.  I happened to recall looking at the 
morestack code last year and remembered that it was all assembly.  I 
looked at it again and I don't see that it calls anything outside of 
it's implementation file (libgcc/config/i386/morestack.S) except for 
_Unwind_Resume and the calling function its self (I think it calls its 
caller).  It saves and restores rsi and rdi and doesn't use any sse 
registers, so it doesn't need to clobber all of the regs in the 
x86_64_ms_sysv_extra_clobbered_registers array.  I'm guessing that this 
should have it's own pattern instead of calling ix86_expand_call in the 
first place.


Of course, I'm the new guy here, so please enlighten me if I'm wrong.

Thanks,
Daniel


Re: [PATCH] [i386] Recompute the frame layout less often

2017-05-16 Thread Daniel Santos

On 05/16/2017 03:34 AM, Bernd Edlinger wrote:

It would be good to have test cases for each of the not-supported warnings that
can happen, so far I only managed to get a test case for -fsplit-stack.


Yes, I'm inclined to agree.  I'll try to get this done today or 
tomorrow.  I've also put in a limiter of one warning per TU.  One 
problem is that there isn't a way to disable the warning, so I may want 
to add that.


Daniel


Re: [PATCH] [i386] Recompute the frame layout less often

2017-05-16 Thread Daniel Santos

On 05/16/2017 12:19 PM, Ian Lance Taylor wrote:

On Mon, May 15, 2017 at 10:00 PM, Daniel Santos  wrote:

Ian, would you mind looking at this please?  A combination of my
-mcall-ms2sysv-xlogues patch with Bernd's patch is causing problems when
ix86_expand_split_stack_prologue() calls ix86_expand_call().

I don't have a lot of context here.  I assume that ms2sysv is going to
be used on Windows systems, where -fsplit-stack isn't really going to
work anyhow, so I think it would probably be OK that reject that
combination if it causes trouble.


Sorry I wasn't more specific.  This -mcall-ms2sysv-xlogues actually 
targets Wine, although they don't use -fsplit-stack.  My patch set as-is 
is disabled when fsplit-stack is used, but during 
ix86_compute_frame_layout, which is too late in the case of 
-fsplit-stack.  I think I should just change this to a sorry() in 
ix86_option_override_internal.



Also, it's overkill for ix86_expand_split_stack_prologue to call
ix86_expand_call.  The call is always to __morestack, and __morestack
is written in assembler, so we could use a simpler version of
ix86_expand_call if that helps.  In particular we can decide that
__morestack doesn't clobber any unusual registers, if that is what is
causing the problem.

Ian


Well aside from the conflict of the two patches, it just looks like it 
has the potential to generate clobbers where none are needed, but I'm 
having trouble actually *proving* that, so maybe I'm just wrong.


Daniel



Re: [PATCH] [i386] Recompute the frame layout less often

2017-05-16 Thread Daniel Santos

On 05/16/2017 02:52 PM, Bernd Edlinger wrote:

I think I solved the problem with -fsplit-stack, I am not sure
if ix86_static_chain_on_stack might change after reload due to
final.c possibly calling targetm.calls.static_chain, but if that
is the case, that is an already pre-existing problem.

The goal of this patch is to make all decisions regarding the
frame layout before the reload pass, and to make sure that
the frame layout does not change unexpectedly it asserts
that the data that goes into the decision does not change
after reload_completed.

With the attached patch -fsplit-stack and the attribute ms_hook_prologue
is handed directly at the ix86_expand_call, because that data is
already known before expansion.

The calls_eh_return and ix86_static_chain_on_stack may become
known at a later time, but after reload it should not change any more.
To be sure, I added an assertion at ix86_static_chain, which the
regression test did not trigger, neither with -m64 nor with -m32.

I have bootstrapped the patch several times, and a few times I
encounterd a segfault in the garbage collection, but it did not
happen every time.  Currently I think that is unrelated to this patch.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu with -m64/-m32.
Is it OK for trunk?


Thanks
Bernd.


With as many formatting errors as I seem to have had, I would like to 
fix those then you patch on top of that if you wouldn't mind terribly.  
While gcc uses subversion, git-blame is still very helpful (then again, 
since Uros committed it for me, I guess that's already off).




Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c(revision 248031)
+++ gcc/config/i386/i386.c(working copy)
@@ -2425,7 +2425,9 @@ static int const x86_64_int_return_registers[4] =

 /* Additional registers that are clobbered by SYSV calls.  */

-unsigned const x86_64_ms_sysv_extra_clobbered_registers[12] =
+#define NUM_X86_64_MS_CLOBBERED_REGS 12
+static int const x86_64_ms_sysv_extra_clobbered_registers
+ [NUM_X86_64_MS_CLOBBERED_REGS] =


Is there a reason you're changing this unsigned to signed int? While 
AX_REG and such are just preprocessor macros, everywhere else it seems 
that register numbers are dealt with as unsigned ints.



@@ -2484,13 +2486,13 @@ class xlogue_layout {
  needs to store registers based upon data in the 
machine_function.  */

   HOST_WIDE_INT get_stack_space_used () const
   {
-const struct machine_function &m = *cfun->machine;
-unsigned last_reg = m.call_ms2sysv_extra_regs + MIN_REGS - 1;
+const struct machine_function *m = cfun->machine;
+unsigned last_reg = m->call_ms2sysv_extra_regs + MIN_REGS - 1;


What is the reason for this change?



-gcc_assert (m.call_ms2sysv_extra_regs <= MAX_EXTRA_REGS);
+gcc_assert (m->call_ms2sysv_extra_regs <= MAX_EXTRA_REGS);
 return m_regs[last_reg].offset
-+ (m.call_ms2sysv_pad_out ? 8 : 0)
-+ STUB_INDEX_OFFSET;
+   + (m->call_ms2sysv_pad_out ? 8 : 0)
+   + STUB_INDEX_OFFSET;
   }

   /* Returns the offset for the base pointer used by the stub. */
@@ -2532,7 +2534,7 @@ class xlogue_layout {
   /* Lazy-inited cache of symbol names for stubs.  */
   char 
m_stub_names[XLOGUE_STUB_COUNT][VARIANT_COUNT][STUB_NAME_MAX_LEN];


-  static const struct xlogue_layout GTY(()) 
s_instances[XLOGUE_SET_COUNT];
+  static const struct GTY(()) xlogue_layout 
s_instances[XLOGUE_SET_COUNT];


Hmm, during development I originally had C-style xlogue_layout as a 
struct and later decided to make it a class and apparently forgot to 
remove the "struct" here.  None the less, it's bazaar that the GTY() 
would go in between the "struct" and the "xlogue_layout."  As I said 
before, I don't fully understand how this GTY works.  Can we just remove 
the "struct" keyword?


Also, if the way I had it was wrong, (and resulted in garbage collection 
not working right) then perhaps it was the cause of a problem I had with 
caching symbol rtx objects.  I could not get this to work because my 
cached objects would somehow become stale and I've since removed that 
code (from xlogue_layout::get_stub_rtx).  (i.e., does GTY effect 
lifespan of globals, TU statics and static C++ data members?)



 /* Constructor for xlogue_layout.  */
@@ -2639,11 +2643,11 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT 
stack_

   : m_hfp (hfp) , m_nregs (hfp ? 17 : 18),
 m_stack_align_off_in (stack_align_off_in)
 {
+  HOST_WIDE_INT offset = stack_align_off_in;
+  unsigned i, j;
+
   memset (m_regs, 0, sizeof (m_regs));
   memset (m_stub_names, 0, sizeof (m_stub_names));
-
-  HOST_WIDE_INT offset = stack_align_off_in;
-  unsigned i, j;
   for (i = j = 0; i < MAX_REGS; ++i)
 {
   unsigned regno = REG_ORDER[i];
@@ -2662,11 +2666,12 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT 
stack_

   m_regs[j].regno= regno;
   m_regs[j++].offset = offset - STUB_INDEX_OFFSET;
 }
-gcc_assert (

Re: [PATCH] [i386] Recompute the frame layout less often

2017-05-17 Thread Daniel Santos

On 05/17/2017 12:41 PM, Bernd Edlinger wrote:

Apologies if I ruined your patch...


As I said before, I'm the new guy here. :) So when this is done I'll 
rebase my changes.  I have some test stuff to fix and some refactoring 
and refinements to xlogue_layout::compute_stub_managed_regs(). And then 
I'll find a solution to the stub_managed_regs after that.



Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c(revision 248031)
+++ gcc/config/i386/i386.c(working copy)
@@ -2425,7 +2425,9 @@ static int const x86_64_int_return_registers[4] =

  /* Additional registers that are clobbered by SYSV calls.  */

-unsigned const x86_64_ms_sysv_extra_clobbered_registers[12] =
+#define NUM_X86_64_MS_CLOBBERED_REGS 12
+static int const x86_64_ms_sysv_extra_clobbered_registers
+ [NUM_X86_64_MS_CLOBBERED_REGS] =

Is there a reason you're changing this unsigned to signed int? While
AX_REG and such are just preprocessor macros, everywhere else it seems
that register numbers are dealt with as unsigned ints.


I actually there seems to be confusion about "int" vs. "unsigned int"
for regno, the advantage of int, is that it can contain -1 as a
exceptional value.  Furthermore there are 3 similar arrays just
above that also use int:

static int const x86_64_int_parameter_registers[6] =
{
DI_REG, SI_REG, DX_REG, CX_REG, R8_REG, R9_REG
};

static int const x86_64_ms_abi_int_parameter_registers[4] =
{
CX_REG, DX_REG, R8_REG, R9_REG
};

static int const x86_64_int_return_registers[4] =
{
AX_REG, DX_REG, DI_REG, SI_REG
};

/* Additional registers that are clobbered by SYSV calls.  */

#define NUM_X86_64_MS_CLOBBERED_REGS 12
static int const x86_64_ms_sysv_extra_clobbered_registers
   [NUM_X86_64_MS_CLOBBERED_REGS] =
{
SI_REG, DI_REG,
XMM6_REG, XMM7_REG,
XMM8_REG, XMM9_REG, XMM10_REG, XMM11_REG,
XMM12_REG, XMM13_REG, XMM14_REG, XMM15_REG
};

So IMHO it looked odd to have one array use a different type in the
first place.


OK.  I think that when I originally started this I was using elements of 
this array in comparisons and got the signed/unsigned warning and 
changed them.  None of the code gives that warning now however.



@@ -2484,13 +2486,13 @@ class xlogue_layout {
   needs to store registers based upon data in the
machine_function.  */
HOST_WIDE_INT get_stack_space_used () const
{
-const struct machine_function &m = *cfun->machine;
-unsigned last_reg = m.call_ms2sysv_extra_regs + MIN_REGS - 1;
+const struct machine_function *m = cfun->machine;
+unsigned last_reg = m->call_ms2sysv_extra_regs + MIN_REGS - 1;

What is the reason for this change?


Because a mixture of C and C++ (C wants "struct" machine_function)
looks ugly, and everywhere else in this module, "m" is a pointer and no
reference.


I see, consistency with the rest of the file.


-gcc_assert (m.call_ms2sysv_extra_regs <= MAX_EXTRA_REGS);
+gcc_assert (m->call_ms2sysv_extra_regs <= MAX_EXTRA_REGS);
  return m_regs[last_reg].offset
-+ (m.call_ms2sysv_pad_out ? 8 : 0)
-+ STUB_INDEX_OFFSET;
+   + (m->call_ms2sysv_pad_out ? 8 : 0)
+   + STUB_INDEX_OFFSET;
}

/* Returns the offset for the base pointer used by the stub. */
@@ -2532,7 +2534,7 @@ class xlogue_layout {
/* Lazy-inited cache of symbol names for stubs.  */
char
m_stub_names[XLOGUE_STUB_COUNT][VARIANT_COUNT][STUB_NAME_MAX_LEN];

-  static const struct xlogue_layout GTY(())
s_instances[XLOGUE_SET_COUNT];
+  static const struct GTY(()) xlogue_layout
s_instances[XLOGUE_SET_COUNT];

Hmm, during development I originally had C-style xlogue_layout as a
struct and later decided to make it a class and apparently forgot to
remove the "struct" here.  None the less, it's bazaar that the GTY()
would go in between the "struct" and the "xlogue_layout."  As I said
before, I don't fully understand how this GTY works.  Can we just remove
the "struct" keyword?

Also, if the way I had it was wrong, (and resulted in garbage collection
not working right) then perhaps it was the cause of a problem I had with
caching symbol rtx objects.  I could not get this to work because my
cached objects would somehow become stale and I've since removed that
code (from xlogue_layout::get_stub_rtx).  (i.e., does GTY effect
lifespan of globals, TU statics and static C++ data members?)


Yes, I have not noticed the "struct", and agree to remove it.

I just saw every other place where GTY is used it is directly after
"struct" or "static", so my impulse was just to follow that examples.


Yeah, and not understanding how it worked I was just trying to follow suit.


But neither version actually makes the class GC-able.  Apparently
this class construct is too complicated for the gengtype machinery.
So I am inclined to remove the GTY keyword completely as it gives
you only false security in GC's ability to garbage collect anything
in this class.


Th

Re: [PATCH] [i386] Recompute the frame layout less often

2017-05-18 Thread Daniel Santos


On 05/17/2017 01:39 PM, Bernd Edlinger wrote:

On 05/15/17 03:39, Daniel Santos wrote:


I should add that if you want to run faster tests just on the ms to sysv
abi code, you can use make RUNTESTFLAGS="ms-sysv.exp" check and then if
that succeeds run the full testsuite.

Daniel

Hmm, that's funny...

If I use "make check-c RUNTESTFLAGS="ms-sysv.exp" -j8" it seems to work,
but if I omit the -j8 it fails:

make check-c RUNTESTFLAGS="ms-sysv.exp"
...Test Run By ed on Wed May 17 20:38:24 2017
Native configuration is x86_64-pc-linux-gnu

=== gcc tests ===

Schedule of variations:
  unix

Running target unix
Using /usr/share/dejagnu/baseboards/unix.exp as board description file
for target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for
target.
Using /home/ed/gnu/gcc-trunk/gcc/testsuite/config/default.exp as
tool-and-target-specific interface file.
Running
/home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp
...
ERROR: tcl error sourcing
/home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp.
ERROR: no such variable
  (read trace on "env(GCC_RUNTEST_PARALLELIZE_DIR)")
  invoked from within
"set parallel_dir "$env(GCC_RUNTEST_PARALLELIZE_DIR)/abi-ms-sysv""
  (file
"/home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp"
line 154)
  invoked from within
"source
/home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp"
  ("uplevel" body line 1)
  invoked from within
"uplevel #0 source
/home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp"
  invoked from within
"catch "uplevel #0 source $test_file_name""

=== gcc Summary ===

/home/ed/gnu/gcc-build/gcc/xgcc  version 8.0.0 20170514 (experimental)
(GCC)

make[2]: Leaving directory `/home/ed/gnu/gcc-build/gcc'
make[1]: Leaving directory `/home/ed/gnu/gcc-build/gcc'


Hmm, that might be something I hadn't actually tried.  And if I run it 
in a directory where I had previously run a multi-job check it doesn't 
blow up (maybe because the directory is already there?)  Due to the 
nature of my test program, I had to break with tradition and implement 
something akin to the test that generates random structs (I forgot what 
that one is called).  It ended up breaking the bastardized 
parallelization scheme, so I had to implement my own re-bastardized 
scheme.  Looks like I can just skip parallelization if 
GCC_RUNTEST_PARALLELIZE_DIR isn't defined.


I have another Solaris test issue on PR 80759 so I'll fix that along 
with it.


Thanks,
Daniel

PS: Oh! it might be due to the difference between -j1 and no -j argument.


Re: [PATCH] [i386] Recompute the frame layout less often

2017-05-18 Thread Daniel Santos

On 05/18/2017 08:37 AM, Bernd Edlinger wrote:

On 05/17/17 04:01, Daniel Santos wrote:

-  if (ignore_outlined && cfun->machine->call_ms2sysv
-  && in_hard_reg_set_p (stub_managed_regs, DImode, regno))
-return false;
+  if (ignore_outlined && cfun->machine->call_ms2sysv)
+{
+  /* Registers who's save & restore will be managed by stubs
called from
+ pro/epilogue.  */
+  HARD_REG_SET stub_managed_regs;
+  xlogue_layout::compute_stub_managed_regs (stub_managed_regs);

+  if (in_hard_reg_set_p (stub_managed_regs, DImode, regno))
+return false;
+}
+
if (crtl->drap_reg
&& regno == REGNO (crtl->drap_reg)
&& !cfun->machine->no_drap_save_restore)

This makes no sense.  The entire purpose of stub_managed_regs is to
cache the result of xlogue_layout::compute_stub_managed_regs() and this
would unnecessarily repeat that calculation for each time
ix86_save_reg() is called.  Since
xlogue_layout::compute_stub_managed_regs() calls ix86_save_reg many
times, this makes it even worse.Which registers are being saved
out-of-line and inline MUST be known at the time the stack layout is
determined.  So stub_managed_regsshould either be left a TU static or
just moved to struct machine_function.

As an aside, I've noticed that xlogue_layout::compute_stub_managed_regs
is calling ix86_save_reg (which isn't trivial) more often than it really
has to, so I've refactored it.


Well, meanwhile I think the stub_managed_regs contain zero information
and need not be saved at all, because it can easily be reconstructed
from  m->call_ms2sysv_extra_regs.

See the attached new version.  Daniel does it work for you?


No, I'm not at all comfortable with you making so many seemingly 
unnecessary changes to this code.  (Although I wish I got this much 
feedback during my RFCs! :)  I can accept the changes to 
is/count_stub_managed_reg (with some caveats), but I do not at all see 
the rationale for changing m_stub_names to a static and adding another 
dimension for the object instance -- from an OO standpoint, that's just 
bad design.  Can you please share your rationale for that?


Incidentally, half of the space in that array is wasted and can be 
trimmed since a given instance of xlogue_layout either supports hard 
frame pointers or doesn't, I just never got around to splitting that 
apart.  (The first three enum xlogue_stub values are for without HFP and 
the last three for with.)  Also, if we wanted to further reduce the 
memory footprint of xlogue_layout objects, the offset field of struct 
reginfo could be changed to int, and if we really wanted to go nuts then 
16 bits would work for both of its fields.


So for is/count_stub_managed_reg(s), you are obviously much more 
familiar with gcc, its passes and the i386 backend, but my knowledge 
level makes me very uncomfortable having the result of 
xlogue_layout::is_stub_managed_reg() determined in a way that has the 
(apparent) potential to differ from from the state at the time the last 
call to ix86_compute_frame_layout() was made; I just don't understand 
well enough what all can change in between the last call to 
ix86_compute_frame_layout() and the last call to 
xlogue_layout::is_stub_managed_reg().  I like your 
count_stub_managed_regs() is_stub_managed_regs() from a *performance* 
standpoint (and I know I get too uptight about that kind of thing, so 
appreciate that), but as to the change in scheme, I would have to trust 
you if you assert that this will always behave consistently.


I also want to give you a little background on some of these seemingly 
repetitive computations.  One of my design goals was for the code to be 
relatively easily to adapted to the management of out-of-line 
pro/epilogue stubs for other possible scenarios where there are a lot of 
clobbers and it could be useful.  Granted, I don't have enough knowledge 
of x86 architectures to identify situations other than this one (in 
64-bit Wine) where it could be helpful and I know that x86 push/pops are 
really small.  So theoretically, struct machine_function's 
"call_ms2sysv" could be changed to something like "outline_savres" and 
any combination of clobbered registers for which there is a descent stub 
could be used if it was a good choice.  I also realize that nobody likes 
complexity that isn't being used, and I respect that.  So if you are 
comfortable with this change and you believe you understand how it works 
then I will agree to it, but I'll be trusting you well beyond my 
knowledge level and ability to confidently predict the outcome (probably 
what a programmer hates the most).


Thanks,
Daniel


Re: [PATCH] [i386] Recompute the frame layout less often

2017-05-18 Thread Daniel Santos



PS: Oh! it might be due to the difference between -j1 and no -j argument.


Yes, that's how I missed it.  This flaw isn't exposed with make -j1, but 
is exposed with just make.  Thanks for finding this!


Daniel



[PATCH 0/2] [testsuite] PR80759 Fix test breakages on i386-pc-solaris2.*

2017-05-18 Thread Daniel Santos

There are a few issues with my ms-sysv.exp tests:

1. Use of gas extensions in do_test.S cause breakages on Solaris,
2. Parallelization breaks when no make -j flag is passed,
3. Builds aren't adding TEST_ALWAYS_FLAGS, so log files filled with
   color escape codes, and
4. The "test unsupported" message is being spammed once for each -j

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80759

I've broken this apart into two patches because I don't know if you'll 
agree with the first one.  I fixed the make -j issue and moved the 
parallelization code into a new gcc/target/lib/parallelize.exp in the 
first patch and fixed all of the other issues in the second.  I've 
removed all usage of gas .struct in my assembly file, used hard-coded 
the offsets into the code and added asserts to main.c to make sure they 
don't change.


I've bootstrapped and retested on x86_64 Linux and have asked Rainer to 
retest on Solaris.  Presuming that succeeds, are you OK with this 
change?  (I have SVN write privs now, so I can even commit it myself).


Thanks,
Daniel



[PATCH 2/2] [testsuite] PR 80759 Remove gas extensions from do-test.S, fix other problems

2017-05-18 Thread Daniel Santos
Use of .struct in do_test.S causes breakages when gas isn't the
assembler (e.g., Solaris).  I also wasn't including TEST_ALWAYS_FLAGS in
my CFLAGS resulting in super-ugly log files.  Finally, this patch
eliminates spam of "test unsupported" (limiting it to one printing).

Signed-off-by: Daniel Santos 
---
 .../gcc.target/x86_64/abi/ms-sysv/do-test.S| 26 +-
 .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.c|  7 ++
 .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp  | 24 
 3 files changed, 27 insertions(+), 30 deletions(-)

diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S 
b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S
index 1395235fd1e..967eb959fbc 100644
--- a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S
+++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S
@@ -46,22 +46,6 @@ fn:
 #  define MOVAPS movaps
 # endif
 
-/* TODO: Is there a cleaner way to provide these offsets?  */
-   .struct 0
-test_data_save:
-
-   .struct test_data_save + 224
-test_data_input:
-
-   .struct test_data_save + 448
-test_data_output:
-
-   .struct test_data_save + 672
-test_data_fn:
-
-   .struct test_data_save + 680
-test_data_retaddr:
-
.text
 
 regs_to_mem:
@@ -132,23 +116,23 @@ L0:
callregs_to_mem
 
# Load register with random data
-   lea test_data + test_data_input(%rip), %rax
+   lea test_data + 224(%rip), %rax
callmem_to_regs
 
# Save original return address
pop %rax
-   movq%rax, test_data + test_data_retaddr(%rip)
+   movq%rax, test_data + 680(%rip)
 
# Call the test function
-   call*test_data + test_data_fn(%rip)
+   call*test_data + 672(%rip)
 
# Restore the original return address
-   movqtest_data + test_data_retaddr(%rip), %rcx
+   movqtest_data + 680(%rip), %rcx
push%rcx
 
# Save test function return value and store resulting register values
push%rax
-   lea test_data + test_data_output(%rip), %rax
+   lea test_data + 448(%rip), %rax
callregs_to_mem
 
# Restore registers
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c 
b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c
index 2a011f5103d..7cec312c386 100644
--- a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c
+++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c
@@ -346,6 +346,13 @@ int main (int argc, char *argv[])
   assert (!((long)&test_data.regdata[REG_SET_INPUT] & 15));
   assert (!((long)&test_data.regdata[REG_SET_OUTPUT] & 15));
 
+  /* Verify offsets hard-coded into assembly.  */
+  assert (__builtin_offsetof (struct test_data, regdata[REG_SET_SAVE]) == 0);
+  assert (__builtin_offsetof (struct test_data, regdata[REG_SET_INPUT]) == 
224);
+  assert (__builtin_offsetof (struct test_data, regdata[REG_SET_OUTPUT]) == 
448);
+  assert (__builtin_offsetof (struct test_data, fn) == 672);
+  assert (__builtin_offsetof (struct test_data, retaddr) == 680);
+
   while ((c = getopt (argc, argv, "s:f")) != -1)
 {
   switch (c)
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp 
b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp
index 77c40dbf349..a9571f194b1 100644
--- a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp
+++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp
@@ -23,18 +23,12 @@
 # see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 # <http://www.gnu.org/licenses/>.
 
-# Exit immediately if this isn't a native x86_64 target.
-if { (![istarget x86_64-*-*] && ![istarget i?86-*-*])
- || ![is-effective-target lp64] || ![isnative] } then {
-unsupported "$subdir"
-return
-}
-
 load_lib gcc-dg.exp
 load_lib parallelize.exp
 
 proc runtest_ms_sysv { cflags generator_args } {
-global GCC_UNDER_TEST HOSTCXX HOSTCXXFLAGS tmpdir srcdir subdir
+global GCC_UNDER_TEST HOSTCXX HOSTCXXFLAGS tmpdir srcdir subdir \
+  TEST_ALWAYS_FLAGS
 
 set objdir "$tmpdir/ms-sysv"
 set generator "$tmpdir/ms-sysv-generate.exe"
@@ -93,7 +87,7 @@ proc runtest_ms_sysv { cflags generator_args } {
}
 }
 
-set cc "$GCC_UNDER_TEST -I$objdir -I$srcdir/$subdir $cflags $warn_flags"
+set cc "$GCC_UNDER_TEST -I$objdir -I$srcdir/$subdir $TEST_ALWAYS_FLAGS 
$cflags $warn_flags"
 
 # Assemble do-test.S
 set src "$srcdir/$subdir/do-test.S"
@@ -142,6 +136,18 @@ if { [parallel-init "ms2sysv"] != 0 } then {
 return;
 }
 
+# Exit if this isn't a native x86_64 target.
+if { (![istarget x86_64-*-*] && ![istarget i?86-*-*])
+ || ![is-effective-target lp64] || ![isnative] } then {
+
+# The first call to parallel-should-run-test is used so we only print the
+# 

[PATCH 1/2] [testsuite] Move non-standard parallelization support into new lib and fix flaw

2017-05-18 Thread Daniel Santos
This fixes a flaw in my parallelization code that caused it to fail when
GCC_RUNTEST_PARALLELIZE_DIR wasn't set.  It worked fine with make -j1,
but failed with just make.

As there could be other tests that might need to do their own
paralellization, I'm moving the that code into it's own file under
gcc/testsuite/lib.

Signed-off-by: Daniel Santos 
---
 .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp  | 48 
 gcc/testsuite/lib/parallelize.exp  | 88 ++
 2 files changed, 103 insertions(+), 33 deletions(-)
 create mode 100644 gcc/testsuite/lib/parallelize.exp

diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp 
b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp
index e317af9bd85..77c40dbf349 100644
--- a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp
+++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp
@@ -30,13 +30,11 @@ if { (![istarget x86_64-*-*] && ![istarget i?86-*-*])
 return
 }
 
-global GCC_RUNTEST_PARALLELIZE_DIR
-
 load_lib gcc-dg.exp
+load_lib parallelize.exp
 
 proc runtest_ms_sysv { cflags generator_args } {
-global GCC_UNDER_TEST HOSTCXX HOSTCXXFLAGS tmpdir srcdir subdir \
-  parallel_dir next_test
+global GCC_UNDER_TEST HOSTCXX HOSTCXXFLAGS tmpdir srcdir subdir
 
 set objdir "$tmpdir/ms-sysv"
 set generator "$tmpdir/ms-sysv-generate.exe"
@@ -46,22 +44,6 @@ proc runtest_ms_sysv { cflags generator_args } {
 set ms_sysv_exe "$objdir/ms-sysv.exe"
 set status 0
 set warn_flags "-Wall"
-set this_test $next_test
-incr next_test
-
-# Do parallelization here
-if [catch {set fd [open "$parallel_dir/$this_test" \
-   [list RDWR CREAT EXCL]]} ] {
-   if { [lindex $::errorCode 1] eq "EEXIST" } then {
-   # Another job is running this test
-   return
-   } else {
-   error "Failed to open $parallel_dir/$this_test: $::errorCode"
-   set status 1
-   }
-} else {
-  close $fd
-}
 
 # Detect when hard frame pointers are enabled (or required) so we know not
 # to generate bp clobbers.
@@ -73,9 +55,17 @@ proc runtest_ms_sysv { cflags generator_args } {
 set descr "$subdir CFLAGS=\"$cflags\" generator_args=\"$generator_args\""
 verbose "$tmpdir: Running test $descr" 1
 
-# Cleanup any previous test in objdir
-file delete -force $objdir
-file mkdir $objdir
+set status [parallel-should-run-test]
+
+if { $status == 1 } then {
+   return
+}
+
+if { $status == 0 } then {
+   # Cleanup any previous test in objdir
+   file delete -force $objdir
+   file mkdir $objdir
+}
 
 # Build the generator (only needs to be done once).
 set src "$srcdir/$subdir/gen.cc"
@@ -148,16 +138,8 @@ proc runtest_ms_sysv { cflags generator_args } {
 }
 
 dg-init
-
-# Setup parallelization
-set next_test 0
-set parallel_dir "$env(GCC_RUNTEST_PARALLELIZE_DIR)/abi-ms-sysv"
-file mkdir "$env(GCC_RUNTEST_PARALLELIZE_DIR)"
-file mkdir "$parallel_dir"
-
-if { ![file isdirectory "$parallel_dir"] } then {
-error "Failed to create directory $parallel_dir: $::errorCode"
-return
+if { [parallel-init "ms2sysv"] != 0 } then {
+return;
 }
 
 set gen_opts "-p0-5"
diff --git a/gcc/testsuite/lib/parallelize.exp 
b/gcc/testsuite/lib/parallelize.exp
new file mode 100644
index 000..346a06f0fa0
--- /dev/null
+++ b/gcc/testsuite/lib/parallelize.exp
@@ -0,0 +1,88 @@
+# Functions for parallelizing tests that cannot use the standard dg-run,
+# dg-runtest or gcc-dg-runtest for some reason.
+#
+# Copyright (C) 2017 Free Software Foundation, Inc.
+# Contributed by Daniel Santos 
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# Under Section 7 of GPL version 3, you are granted additional
+# permissions described in the GCC Runtime Library Exception, version
+# 3.1, as published by the Free Software Foundation.
+#
+# You should have received a copy of the GNU General Public License and
+# a copy of the GCC Runtime Library Exception along with this program;
+# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+# <http://www.gnu.org/licenses/>.
+
+set is_parallel_build 0
+set parallel_next_test 0
+set parallel_dir ""
+
+# Setup parallelization directory and variabl

Re: [PATCH 2/2] [testsuite] PR 80759 Remove gas extensions from do-test.S, fix other problems

2017-05-19 Thread Daniel Santos

Thanks you for your assistance Rainer!

On 05/19/2017 04:03 AM, Rainer Orth wrote:

unfortunately, it still doesn't, as explained in the PR.  The multilib
support is still wrong/non-existant.


I guess I thought for some reason that would magically appear in 
TEST_ALWAYS_FLAGS. :)  I've explicitly added it for now, but I haven't 
yet found where -m64 gets fed in the normal flow of things and I would 
rather know I'm doing things as closely as possible to how the rest if 
the test harness does it.



(I have SVN write privs now, so I can even commit it myself).

Please always include ChangeLog entries with your patch submissions so
one can easily see what you've change
(cf. https://gcc.gnu.org/contribute.html).

Thanks.
 Rainer


I hate when I forget that!  I'll be sure to remember when I resubmit.


Use of .struct in do_test.S causes breakages when gas isn't the
assembler (e.g., Solaris).  I also wasn't including TEST_ALWAYS_FLAGS in
my CFLAGS resulting in super-ugly log files.  Finally, this patch
eliminates spam of "test unsupported" (limiting it to one printing).

Signed-off-by: Daniel Santos 
---
  .../gcc.target/x86_64/abi/ms-sysv/do-test.S| 26 +-
  .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.c|  7 ++
  .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp  | 24 
  3 files changed, 27 insertions(+), 30 deletions(-)

diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S 
b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S
index 1395235fd1e..967eb959fbc 100644
--- a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S
+++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S
@@ -46,22 +46,6 @@ fn:
  #  define MOVAPS movaps
  # endif
  
-/* TODO: Is there a cleaner way to provide these offsets?  */

-   .struct 0
-test_data_save:
-
-   .struct test_data_save + 224
-test_data_input:
-
-   .struct test_data_save + 448
-test_data_output:
-
-   .struct test_data_save + 672
-test_data_fn:
-
-   .struct test_data_save + 680
-test_data_retaddr:
-
.text
  
  regs_to_mem:

@@ -132,23 +116,23 @@ L0:
callregs_to_mem
  
  	# Load register with random data

-   lea test_data + test_data_input(%rip), %rax
+   lea test_data + 224(%rip), %rax
callmem_to_regs
  
  	# Save original return address

pop %rax
-   movq%rax, test_data + test_data_retaddr(%rip)
+   movq%rax, test_data + 680(%rip)
  
  	# Call the test function

-   call*test_data + test_data_fn(%rip)
+   call*test_data + 672(%rip)
  
  	# Restore the original return address

-   movqtest_data + test_data_retaddr(%rip), %rcx
+   movqtest_data + 680(%rip), %rcx
push%rcx
  
  	# Save test function return value and store resulting register values

push%rax
-   lea test_data + test_data_output(%rip), %rax
+   lea test_data + 448(%rip), %rax
callregs_to_mem
  
  	# Restore registers

diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c 
b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c
index 2a011f5103d..7cec312c386 100644
--- a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c
+++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c
@@ -346,6 +346,13 @@ int main (int argc, char *argv[])
assert (!((long)&test_data.regdata[REG_SET_INPUT] & 15));
assert (!((long)&test_data.regdata[REG_SET_OUTPUT] & 15));
  
+  /* Verify offsets hard-coded into assembly.  */

+  assert (__builtin_offsetof (struct test_data, regdata[REG_SET_SAVE]) == 0);
+  assert (__builtin_offsetof (struct test_data, regdata[REG_SET_INPUT]) == 
224);
+  assert (__builtin_offsetof (struct test_data, regdata[REG_SET_OUTPUT]) == 
448);
+  assert (__builtin_offsetof (struct test_data, fn) == 672);
+  assert (__builtin_offsetof (struct test_data, retaddr) == 680);
+
while ((c = getopt (argc, argv, "s:f")) != -1)
  {
switch (c)

while .struct is a gas extension and doesn't work with the Solaris/x86
/bin/as, having the same (mostly unexplained) constants hardcoded in two
places isn't exactly helpful.  I'd suggest moving them to (say)
ms-sysv.h and include that from both do-test.S (which is preprocessed
assembler after all) and ms-sysv.c.

Rainer


Well, I don't have an ms-sysv.h, but I suppose I can add one.

I'm starting to lean more towards the idea of plucking out the portion 
of asm that uses these offsets, moving that to an inline asm function 
and having the code in do-test.S just jmp to it.  I wish there was some 
sort of "naked" attribute for x86 since I'm not well versed in every way 
that the compiler can change it in a way that wouldn't be friendly.


void __attribute__((optimize ("-O0 -fno-split-stack")))
do_test_body (void)
{
  __asm__ __volatile__ (

Re: [PATCH] [i386] Recompute the frame layout less often

2017-05-23 Thread Daniel Santos

On 05/22/2017 01:32 PM, Bernd Edlinger wrote:

On 05/19/17 05:17, Daniel Santos wrote:


No, I'm not at all comfortable with you making so many seemingly
unnecessary changes to this code.  (Although I wish I got this much
feedback during my RFCs! :)  I can accept the changes to
is/count_stub_managed_reg (with some caveats), but I do not at all see
the rationale for changing m_stub_names to a static and adding another
dimension for the object instance -- from an OO standpoint, that's just
bad design.  Can you please share your rationale for that?


Hmm, sorry about that ...
I just thought it would be nice to avoid the const-cast here.


Well remember const-correctness isn't about an object's internal 
(bitwise) state, but it's externally visible (logical) state.  So a 
const member function need not avoid altering it's internal state if the 
externally visible state remains unchanged, such as when caching some 
result or lazy initing.  I have tended to prefer using const_cast for 
this, isolating its use to a single const accessor function (or if () 
block) to leave less room for the data members to be accidentally 
altered in another const member function.  But mutable is generally 
preferred over const_cast, which opens up the danger of accidentally 
modifying an object's logical state (especially by a subsequent edit), 
so using mutable is probably a better practice anyway.


However, ...


This moved the m_stub_names from all 4 instances to one static
array s_stub_names.  But looking at it again, I think the extra
dimension is not even necessary, because all instances share the
same data, so removing that extra dimension again will be fine.


You are correct!  And I see that you're new patch has already changed 
get_stub_name to a static member function, so great!



Incidentally, half of the space in that array is wasted and can be
trimmed since a given instance of xlogue_layout either supports hard
frame pointers or doesn't, I just never got around to splitting that
apart.  (The first three enum xlogue_stub values are for without HFP and
the last three for with.)  Also, if we wanted to further reduce the
memory footprint of xlogue_layout objects, the offset field of struct
reginfo could be changed to int, and if we really wanted to go nuts then
16 bits would work for both of its fields.

So for is/count_stub_managed_reg(s), you are obviously much more
familiar with gcc, its passes and the i386 backend, but my knowledge
level makes me very uncomfortable having the result of
xlogue_layout::is_stub_managed_reg() determined in a way that has the
(apparent) potential to differ from from the state at the time the last
call to ix86_compute_frame_layout() was made; I just don't understand

I fund it technically difficult to add a HARD_REG_SET to
struct machine_function, and tried to avoid the extra overhead of
calling ix86_save_reg so often, which you also felt uncomfortable with.

So, if you look at compute_stub_managed_regs I first saw that the
first loop can never terminate thru the "return 0", because the
registers in x86_64_ms_sysv_extra_clobbered_registers are guaranteed
to be clobbered.  Then I saw that the bits in stub_managed_regs
are always added in the same sequence, thus the result depends only
on the number call_ms2sysv_extra_regs and hfp so everything is already
available in struct machine_function.


Thanks
Bernd.


Yes, I agree with how you have refactored compute_stub_managed_regs 
given your rationale of not adding another header dependency to i386.h.  
It is only the overall scheme of calculating this outside of 
ix86_compute_frame_layout that I cannot validate due to my not having a 
good understanding of what can and cannot change in between the time 
that ix86_compute_frame_layout is last called and the last call to 
is_stub_managed_regs().


As Uros said, my patch set touches a "delicate part of the compiler, 
where lots of code-paths cross each other (and we have had quite some 
hard-to-fix bugs in this area)" 
(https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01924.html).  I wrote it 
the way I did with my understanding of what was safe to do and your 
alterations move it's functionality outside of that understanding.  So 
if you say that this won't break it, then I will have to trust you (and 
the testsuite) on that.


On that note, the tests are undergoing some change and bug fixes and I'm 
planning on adding more tests to validate non-breakage with various 
other stack frame-related options and probably additional tests (and 
test options) triggered by GCC_TEST_RUN_EXPENSIVE or some such.


Daniel





Re: [PATCH] [i386] Recompute the frame layout less often

2017-05-23 Thread Daniel Santos

On 05/23/2017 09:31 AM, Bernd Edlinger wrote:

Hi,

this is the latest version of my patch.

As already said, it attempts to compute
the frame layout only when relevant data have
changed.

Apologies for doing more clean-up on Daniel's
patch than absolutely necessary, but ...

Bootstrap and reg-tested successfully on
x86_64-pc-linux-gnu with unix\{,-m32\}.
Is it OK for trunk?


Thanks
Bernd.


OK with me.

Thanks,
Daniel


Use aligned SSE movs for re-aligned MS ABI pro/epilogues

2016-12-22 Thread Daniel Santos
According to the Microsoft 64-bit ABI specification, registers RDI, RSI 
and XMM6-15 are non-volatile and the stack alignment is 16 bytes.  In 
practice, the Windows implementation appears to not be so picky about 
the 16-byte alignment requirement, probably because it never to save SSE 
registers and instead just never uses them.  This led to a large list 
(https://bugs.winehq.org/show_bug.cgi?id=27680) of Win64 programs 
violating the ABI with impunity, but crashing in Wine until 
force_align_arg_pointer was added to gcc and used in Wine.


Stack re-alignment was originally done prior to int register saves, but 
was moved to after SSE saves in 2010 to better facilitate 
parallelization, and for simplicity's sake, the stack pointer was 
considered invalid after stack re-alignment and SSE movs were emitted 
unaligned relative to the frame pointer.  But now that forced stack 
re-alignment is the new normal for Wine64, it means that it always gets 
the unaligned movs in Wine. This patch set fixes the problem while 
preserving the improved parallelization of int register saves of Richard 
Henderson's patch in 2010.


This patchset is a prerequisite to another I'm still refining that 
out-of-lines these pro/epilogues. I'm still pretty new to this project, 
so I hope I haven't missed anything. (No additional failures in tests.)


Daniel Santos

2016-12-21  Daniel Santos  

* config/i386/i386.h (struct machine_frame_state): New fields
sp_realigned and sp_realigned_offset.

* config/i386/i386.c
(struct ix86_frame): New fields stack_realign_allocate_offset and
stack_realign_offset.
(ix86_compute_frame_layout): Modify re-alignment calculations.
(sp_valid_at, fp_valid_at): New inline functions.
(choose_basereg): New function.
(choose_baseaddr): Add align parameter, use choose_basereg and modify
all callers.
(ix86_emit_save_reg_using_mov, ix86_emit_restore_sse_regs_using_mov):
Use align parameter of choose_baseaddr to generated aligned SSE movs
when possible.
(pro_epilogue_adjust_stack): Modify to track
machine_frame_state::sp_realigned.
(ix86_expand_prologue): Modify stack re-alignment code.
(ix86_emit_leave): Clear machine_frame_state::sp_realigned.
(ix86_expand_epilogue): Modify validity checks of frame and stack
pointers.




[PATCH 2/3] [i386] Keep stack pointer valid after after re-alignment.

2016-12-22 Thread Daniel Santos
This stage adds the fields sp_realigned and sp_realigned_offset to
struct machine_frame_state and adds the concept of the stack pointer
being re-aligned rather than invalid.  The inline functions sp_valid_at
and fp_valid_at are added to test if a given location relative to the
CFA can be accessed with the stack or frame pointer, respectively.

Stack allocation prior to re-alignment is modified so that we allocate
what is needed, but don't allocate unneeded space in the event that no
SSE registers are saved, but frame.sse_reg_save_offset is increased for
alignment.

As this change only alters how SSE registers are saved, moving the
re-alignment AND should not hinder parallelization of int register saves.
---
 gcc/config/i386/i386.c | 69 --
 gcc/config/i386/i386.h | 12 +
 2 files changed, 62 insertions(+), 19 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7f7389cbe31..b5f9f36094f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12604,6 +12604,24 @@ choose_baseaddr_len (unsigned int regno, HOST_WIDE_INT 
offset)
   return len;
 }
 
+/* Determine if the stack pointer is valid for accessing the cfa_offset.  */
+
+static inline bool sp_valid_at (HOST_WIDE_INT cfa_offset)
+{
+  const struct machine_frame_state &fs = cfun->machine->fs;
+  return fs.sp_valid && !(fs.sp_realigned
+ && cfa_offset < fs.sp_realigned_offset);
+}
+
+/* Determine if the frame pointer is valid for accessing the cfa_offset.  */
+
+static inline bool fp_valid_at (HOST_WIDE_INT cfa_offset)
+{
+  const struct machine_frame_state &fs = cfun->machine->fs;
+  return fs.fp_valid && !(fs.sp_valid && fs.sp_realigned
+ && cfa_offset >= fs.sp_realigned_offset);
+}
+
 /* Return an RTX that points to CFA_OFFSET within the stack frame.
The valid base registers are taken from CFUN->MACHINE->FS.  */
 
@@ -12902,15 +12920,18 @@ pro_epilogue_adjust_stack (rtx dest, rtx src, rtx 
offset,
 {
   HOST_WIDE_INT ooffset = m->fs.sp_offset;
   bool valid = m->fs.sp_valid;
+  bool realigned = m->fs.sp_realigned;
 
   if (src == hard_frame_pointer_rtx)
{
  valid = m->fs.fp_valid;
+ realigned = false;
  ooffset = m->fs.fp_offset;
}
   else if (src == crtl->drap_reg)
{
  valid = m->fs.drap_valid;
+ realigned = false;
  ooffset = 0;
}
   else
@@ -12924,6 +12945,7 @@ pro_epilogue_adjust_stack (rtx dest, rtx src, rtx 
offset,
 
   m->fs.sp_offset = ooffset - INTVAL (offset);
   m->fs.sp_valid = valid;
+  m->fs.sp_realigned = realigned;
 }
 }
 
@@ -13673,6 +13695,7 @@ ix86_expand_prologue (void)
  this is fudged; we're interested to offsets within the local frame.  */
   m->fs.sp_offset = INCOMING_FRAME_SP_OFFSET;
   m->fs.sp_valid = true;
+  m->fs.sp_realigned = false;
 
   ix86_compute_frame_layout (&frame);
 
@@ -13889,11 +13912,10 @@ ix86_expand_prologue (void)
 that we must allocate the size of the register save area before
 performing the actual alignment.  Otherwise we cannot guarantee
 that there's enough storage above the realignment point.  */
-  if (m->fs.sp_offset != frame.sse_reg_save_offset)
+  allocate = frame.stack_realign_allocate_offset - m->fs.sp_offset;
+  if (allocate)
 pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
-  GEN_INT (m->fs.sp_offset
-   - frame.sse_reg_save_offset),
-  -1, false);
+  GEN_INT (-allocate), -1, false);
 
   /* Align the stack.  */
   insn = emit_insn (ix86_gen_andsp (stack_pointer_rtx,
@@ -13901,11 +13923,14 @@ ix86_expand_prologue (void)
GEN_INT (-align_bytes)));
 
   /* For the purposes of register save area addressing, the stack
- pointer is no longer valid.  As for the value of sp_offset,
-see ix86_compute_frame_layout, which we need to match in order
-to pass verification of stack_pointer_offset at the end.  */
+pointer can no longer be used to access anything in the frame
+below m->fs.sp_realigned_offset and the frame pointer cannot be
+used for anything at or above.  */
+  gcc_assert (m->fs.sp_offset == frame.stack_realign_allocate_offset);
   m->fs.sp_offset = ROUND_UP (m->fs.sp_offset, align_bytes);
-  m->fs.sp_valid = false;
+  m->fs.sp_realigned = true;
+  m->fs.sp_realigned_offset = m->fs.sp_offset - frame.nsseregs * 16;
+  gcc_assert (m->fs.sp_realigned_offset == frame.stack_realign_offset);
 }
 
   allocate = frame.stack_pointer_offset - m->fs.sp_offset;
@@ -14244,6 +14269,7 @@ ix86_emit_leave (void)
 
   gcc_assert (m->fs.fp_valid);
   m->fs.sp_valid = true;
+  m->fs.sp_realigned = false;
   m->fs.sp_offset = 

[PATCH 1/3] [i386] Move stack frame re-alignment to before SSE saves.

2016-12-22 Thread Daniel Santos
This step adds new fields to struct ix86_frame to track where we started
the stack re-alignment and what we need to allocate prior to
re-alignment.  In ix86_compute_frame_layout, we do the stack frame
re-alignment computation prior to computing the SSE save area so that it
we have an aligned SSE save area.

This new also assures that the SSE save area is properly aligned when
DRAP is used.
---
 gcc/config/i386/i386.c | 40 +---
 1 file changed, 25 insertions(+), 15 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 792e8ec232d..7f7389cbe31 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2453,7 +2453,7 @@ struct GTY(()) stack_local_entry {
[saved regs]
<- regs_save_offset
[padding0]
-
+   <- stack_realign_offset
[saved SSE regs]
<- sse_regs_save_offset
[padding1]  |
@@ -2479,6 +2479,8 @@ struct ix86_frame
   HOST_WIDE_INT stack_pointer_offset;
   HOST_WIDE_INT hfp_save_offset;
   HOST_WIDE_INT reg_save_offset;
+  HOST_WIDE_INT stack_realign_allocate_offset;
+  HOST_WIDE_INT stack_realign_offset;
   HOST_WIDE_INT sse_reg_save_offset;
 
   /* When save_regs_using_mov is set, emit prologue using
@@ -12457,28 +12459,36 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
   if (TARGET_SEH)
 frame->hard_frame_pointer_offset = offset;
 
+  /* When re-aligning the stack frame, but not saving SSE registers, this
+ is the offset we want to allocate memory for.  */
+  frame->stack_realign_allocate_offset = offset;
+
+  /* The re-aligned stack starts here.  Values before this point are not
+ directly comparable with values below this point.  Use sp_valid_at
+ to determine if the stack pointer is valid for a given offset and
+ fp_valid_at for the frame pointer.  */
+  if (stack_realign_fp)
+offset = ROUND_UP (offset, stack_alignment_needed);
+  frame->stack_realign_offset = offset;
+
   /* Align and set SSE register save area.  */
   if (frame->nsseregs)
 {
   /* The only ABI that has saved SSE registers (Win64) also has a
-16-byte aligned default stack, and thus we don't need to be
-within the re-aligned local stack frame to save them.  In case
-incoming stack boundary is aligned to less than 16 bytes,
-unaligned move of SSE register will be emitted, so there is
-no point to round up the SSE register save area outside the
-re-aligned local stack frame to 16 bytes.  */
-  if (ix86_incoming_stack_boundary >= 128)
+16-byte aligned default stack.  However, many programs violate
+the ABI, and Wine64 forces stack realignment to compensate.
+
+If the incoming stack boundary is at least 16 bytes, or DRAP is
+required and the DRAP re-alignment boundary is at least 16 bytes,
+then we want the SSE register save area properly aligned.  */
+  if (ix86_incoming_stack_boundary >= 128
+  || (stack_realign_drap && stack_alignment_needed >= 16))
offset = ROUND_UP (offset, 16);
   offset += frame->nsseregs * 16;
+  frame->stack_realign_allocate_offset = offset;
 }
-  frame->sse_reg_save_offset = offset;
 
-  /* The re-aligned stack starts here.  Values before this point are not
- directly comparable with values below this point.  In order to make
- sure that no value happens to be the same before and after, force
- the alignment computation below to add a non-zero value.  */
-  if (stack_realign_fp)
-offset = ROUND_UP (offset, stack_alignment_needed);
+  frame->sse_reg_save_offset = offset;
 
   /* Va-arg area */
   frame->va_arg_size = ix86_varargs_gpr_size + ix86_varargs_fpr_size;
-- 
2.11.0



[PATCH 3/3] [i386] Use re-aligned stack pointer for aligned SSE movs

2016-12-22 Thread Daniel Santos
This adds an optional `align' parameter to choose_baseaddr allowing the
caller to request an address that is aligned to some boundary.  Then
ix86_emit_save_regs_using_mov and ix86_emit_restore_regs_using_mov are
modified so that optimally aligned memory is used when such a base
register is available.
---
 gcc/config/i386/i386.c | 110 ++---
 1 file changed, 87 insertions(+), 23 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b5f9f36094f..e60267a903d 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12622,15 +12622,40 @@ static inline bool fp_valid_at (HOST_WIDE_INT 
cfa_offset)
  && cfa_offset >= fs.sp_realigned_offset);
 }
 
-/* Return an RTX that points to CFA_OFFSET within the stack frame.
-   The valid base registers are taken from CFUN->MACHINE->FS.  */
+/* Choose a base register based upon alignment requested, speed and/or
+   size.  */
 
-static rtx
-choose_baseaddr (HOST_WIDE_INT cfa_offset)
+static void choose_basereg (HOST_WIDE_INT cfa_offset, rtx &base_reg,
+   HOST_WIDE_INT &base_offset,
+   unsigned int align_reqested, unsigned int *align)
 {
   const struct machine_function *m = cfun->machine;
-  rtx base_reg = NULL;
-  HOST_WIDE_INT base_offset = 0;
+  unsigned int hfp_align;
+  unsigned int drap_align;
+  unsigned int sp_align;
+  bool hfp_ok  = fp_valid_at (cfa_offset);
+  bool drap_ok = m->fs.drap_valid;
+  bool sp_ok   = sp_valid_at (cfa_offset);
+
+  hfp_align = drap_align = sp_align = INCOMING_STACK_BOUNDARY;
+
+  /* Filter out any registers that don't meet the requested alignment
+ criteria.  */
+  if (align_reqested)
+{
+  /* Make sure we weren't given a cfa_offset incongruent with the
+align_reqested.  */
+  gcc_assert (!(cfa_offset & (align_reqested / BITS_PER_UNIT - 1)));
+
+  if (m->fs.realigned)
+   hfp_align = drap_align = sp_align = crtl->stack_alignment_needed;
+  else if (m->fs.sp_realigned)
+   sp_align = crtl->stack_alignment_needed;
+
+  hfp_ok = hfp_ok && hfp_align >= align_reqested;
+  drap_ok = drap_ok && drap_align >= align_reqested;
+  sp_ok = sp_ok && sp_align >= align_reqested;
+}
 
   if (m->use_fast_prologue_epilogue)
 {
@@ -12639,17 +12664,17 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset)
  while DRAP must be reloaded within the epilogue.  But choose either
  over the SP due to increased encoding size.  */
 
-  if (m->fs.fp_valid)
+  if (hfp_ok)
{
  base_reg = hard_frame_pointer_rtx;
  base_offset = m->fs.fp_offset - cfa_offset;
}
-  else if (m->fs.drap_valid)
+  else if (drap_ok)
{
  base_reg = crtl->drap_reg;
  base_offset = 0 - cfa_offset;
}
-  else if (m->fs.sp_valid)
+  else if (sp_ok)
{
  base_reg = stack_pointer_rtx;
  base_offset = m->fs.sp_offset - cfa_offset;
@@ -12662,13 +12687,13 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset)
 
   /* Choose the base register with the smallest address encoding.
  With a tie, choose FP > DRAP > SP.  */
-  if (m->fs.sp_valid)
+  if (sp_ok)
{
  base_reg = stack_pointer_rtx;
  base_offset = m->fs.sp_offset - cfa_offset;
   len = choose_baseaddr_len (STACK_POINTER_REGNUM, base_offset);
}
-  if (m->fs.drap_valid)
+  if (drap_ok)
{
  toffset = 0 - cfa_offset;
  tlen = choose_baseaddr_len (REGNO (crtl->drap_reg), toffset);
@@ -12679,7 +12704,7 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset)
  len = tlen;
}
}
-  if (m->fs.fp_valid)
+  if (hfp_ok)
{
  toffset = m->fs.fp_offset - cfa_offset;
  tlen = choose_baseaddr_len (HARD_FRAME_POINTER_REGNUM, toffset);
@@ -12691,8 +12716,40 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset)
}
}
 }
-  gcc_assert (base_reg != NULL);
 
+/* Set the align return value.  */
+if (align)
+  {
+   if (base_reg == stack_pointer_rtx)
+ *align = sp_align;
+   else if (base_reg == crtl->drap_reg)
+ *align = drap_align;
+   else if (base_reg == hard_frame_pointer_rtx)
+ *align = hfp_align;
+  }
+}
+
+/* Return an RTX that points to CFA_OFFSET within the stack frame and
+   the alignment of address.  If align is non-null, it should point to
+   an alignment value (in bits) that is preferred or zero and will
+   recieve the alignment of the base register that was selected.  The
+   valid base registers are taken from CFUN->MACHINE->FS.  */
+
+static rtx
+choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align)
+{
+  rtx base_reg = NULL;
+  HOST_WIDE_INT base_offset = 0;
+
+  /* If a specific alignment is requested, try to get a base register
+ with that alignment first.  */
+  if (align && *align)
+choose_basereg (cfa_offset

Re: Use aligned SSE movs for re-aligned MS ABI pro/epilogues

2016-12-27 Thread Daniel Santos

On 12/27/2016 07:56 AM, Uros Bizjak wrote:

Hello!


According to the Microsoft 64-bit ABI specification, registers RDI, RSI and 
XMM6-15 are
non-volatile and the stack alignment is 16 bytes. In practice, the Windows 
implementation
appears to not be so picky about the 16-byte alignment requirement, probably 
because it never
to save SSE registers and instead just never uses them. This led to a large list
(https://bugs.winehq.org/show_bug.cgi?id=27680) of Win64 programs violating the 
ABI with
impunity, but crashing in Wine until force_align_arg_pointer was added to gcc 
and used in Wine.

Stack re-alignment was originally done prior to int register saves, but was 
moved to after SSE
saves in 2010 to better facilitate parallelization, and for simplicity's sake, 
the stack pointer was
considered invalid after stack re-alignment and SSE movs were emitted unaligned 
relative to the
frame pointer. But now that forced stack re-alignment is the new normal for 
Wine64, it means that
it always gets the unaligned movs in Wine. This patch set fixes the problem 
while preserving the
improved parallelization of int register saves of Richard Henderson's patch in 
2010.

I have looked briefly through the patchset, and the approach looks good to me.

However, this patch is touching somehow delicate part of the compiler,
where lots of code-paths cross each other (and we have had quite some
hard-to-fix bugs in this area).

IMO, the patch is not appropriate for inclusion at the current stage
of the compiler development, and should wait for early stage 1. Please
resubmit it later for inclusion.

Thanks,
Uros.

Thank you for the review. Yes, this is a very delicate code path indeed. 
For the purposes of the 64-bit Microsoft ABI function calling a System V 
function, I've written a fairly exhaustive test program (although 
probably not fully comprehensive) for testing pro/epilogues under 
various conditions. I'm completely new to dejagnu however, so I still 
need to figure out how to properly integrate it. Maybe when I re-submit 
this patch set I can submit the new tests with it.


Thanks,
Daniel


Re: [RFC] [PATCH v3 0/8] [i386] Use out-of-line stubs for ms_abi pro/epilogues

2017-03-30 Thread Daniel Santos
I have finally completed all tests for Cygwin and MinGW both 32- & 
64-bit with no additional test failures.  There are still 567 tests 
failing both pre- and post-patch with error "error while loading shared 
libraries: cyggfortran-4.dll: cannot open shared object file: No such 
file or directory" in all 32-bit tests even after my (fairly crude) 
patch to address that problem.  So as a separate issue, I don't yet have 
a clean patch set to resolve the windows dll search path issue 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79867).


I had to change the test program, as I was dependent upon XSI extensions 
which aren't available on Cygwin, so I'll need to repost that.  Also, I 
had to make one small change in the "aligned SSE MOVs" patch, disabling 
it on SEH targets since gcc/config/i386/winnt.c does not currently 
support the REG_CFA_EXPRESSION note in its unwind emit.  This 
optimization primarily targets 64-bit Wine anyway, where stack 
realignment is now required.


Daniel


Re: [PATCH 2/8] [i386] Add option -moutline-msabi-xlogues

2017-04-01 Thread Daniel Santos
Uros, can I please get your opinion on this?  I have no objections to 
this, but I want to check with you first.


On 02/10/2017 10:54 AM, Sandra Loosemore wrote:
I'd like to re-iterate my previous request that the option be renamed 
-mno-inline-msabi-xlogues.  No other option that controls inlining 
uses "outline" for the negative (disabling inlining).  We have way too 
many options and the least we can do is try to make them use 
consistent conventions.


-Sandra


So the default would be -minline-msabi-xlogues and 
-mno-inline-msabi-xlogues would enable this optimization.


Thanks,
Daniel


Re: [RFC] [PATCH] [i386] Test program for ms_abi to sysv_abi function calls

2017-04-01 Thread Daniel Santos
I've had to make changes to the test program, as I was using XSI 
extensions which aren't implemented on Cygwin.  But before I post the 
new patch, I noticed that it may be in the wrong directory. There is a 
gcc/testsuite/gcc.target/x86_64/abi directory and even a callabi 
subdirectory of that.  For taxonomic accuracy, I would say it probably 
belongs as a subdirectory of .../abi or .../abi/callabi and renamed from 
"msabi" to "ms_sysv".  Any objections?  (It is currently in 
gcc/testsuite/gcc.target/i386/msabi.)


Thanks,
Daniel


[testsuite] Fix loading wrong DLLs on Windows, merge duplicate target-libpath.exp

2017-04-03 Thread Daniel Santos
We currently have two copies of target-libpath.exp in the tree under
gcc/testsuite/lib and libffi/testsuite/lib.  It was originally pulled
into the libffi project from downstream gcc in 2009
(https://github.com/libffi/libffi/commit/5cbe2058c128e848446ae79fe15ee54260a90559).
Then in 2012, Anthony Green (from libffi) modified it to correct this
Windows problem (thank you!
https://github.com/libffi/libffi/commit/bd78c9c3311244dd5f877c915b0dff91621dd253).
In 2015, this file got pulled from upstream libffi back into gcc, thus
beginning two separate development paths
(https://github.com/gcc-mirror/gcc/commit/89d8a412de548b218cf7c967e65ad98bceb1ed4e).

This patch merges the changes from libffi upstream which correctly solve
the Windows DLL load path problem and removes the duplicate from
libffi/testsuite/lib.  This fixes most of bug #79867, implementing
correct behaviour for set_ld_library_path_env_vars and
restore_ld_library_path_env_vars.  However, there is still incorrect
behaviour in DejaGNU's unix_load that should eventually be adddressed,
although I cannot yet point to a specific failure that it is causing.

gcc/ChangeLog:
2017-04-03  Daniel Santos 

PR testsuite/79867
* testsuite/lib/target-libpath.exp (set_ld_library_path_env_vars,
restore_ld_library_path_env_vars): Merge changes from libffi upstream,
correcting DLL load path problems on Windows.

libffi/ChangeLog:
2017-04-03  Daniel Santos 

PR testsuite/79867
* testsuite/lib/target-libpath.exp: Remove.
* testsuite/Makefile.in: Remove target-libpath.exp.
* testsuite/Makefile.am: Regenerated.

Signed-off-by: Daniel Santos 
---
 gcc/testsuite/lib/target-libpath.exp|  21 +++
 libffi/testsuite/Makefile.am|   2 +-
 libffi/testsuite/Makefile.in|   2 +-
 libffi/testsuite/lib/target-libpath.exp | 283 
 4 files changed, 23 insertions(+), 285 deletions(-)
 delete mode 100644 libffi/testsuite/lib/target-libpath.exp

diff --git a/gcc/testsuite/lib/target-libpath.exp 
b/gcc/testsuite/lib/target-libpath.exp
index 9b3e201ed68..b6d01b31016 100644
--- a/gcc/testsuite/lib/target-libpath.exp
+++ b/gcc/testsuite/lib/target-libpath.exp
@@ -23,6 +23,7 @@ set orig_shlib_path_saved 0
 set orig_ld_library_path_32_saved 0
 set orig_ld_library_path_64_saved 0
 set orig_dyld_library_path_saved 0
+set orig_path_saved 0
 set orig_gcc_exec_prefix_saved 0
 set orig_gcc_exec_prefix_checked 0
 
@@ -55,6 +56,7 @@ proc set_ld_library_path_env_vars { } {
   global orig_ld_library_path_32_saved
   global orig_ld_library_path_64_saved
   global orig_dyld_library_path_saved
+  global orig_path_saved
   global orig_gcc_exec_prefix_saved
   global orig_gcc_exec_prefix_checked
   global orig_ld_library_path
@@ -63,6 +65,7 @@ proc set_ld_library_path_env_vars { } {
   global orig_ld_library_path_32
   global orig_ld_library_path_64
   global orig_dyld_library_path
+  global orig_path
   global orig_gcc_exec_prefix
   global env
 
@@ -110,6 +113,10 @@ proc set_ld_library_path_env_vars { } {
   set orig_dyld_library_path "$env(DYLD_LIBRARY_PATH)"
   set orig_dyld_library_path_saved 1
 }
+if [info exists env(PATH)] {
+  set orig_path "$env(PATH)"
+  set orig_path_saved 1
+}
   }
 
   # We need to set ld library path in the environment.  Currently,
@@ -164,6 +171,13 @@ proc set_ld_library_path_env_vars { } {
   } else {
 setenv DYLD_LIBRARY_PATH "$ld_library_path"
   }
+  if { [istarget *-*-cygwin*] || [istarget *-*-mingw*] } {
+if { $orig_path_saved } {
+  setenv PATH "$ld_library_path:$orig_path"
+} else {
+  setenv PATH "$ld_library_path"
+}
+  }
 
   verbose -log "LD_LIBRARY_PATH=[getenv LD_LIBRARY_PATH]"
   verbose -log "LD_RUN_PATH=[getenv LD_RUN_PATH]"
@@ -201,12 +215,14 @@ proc restore_ld_library_path_env_vars { } {
   global orig_ld_library_path_32_saved
   global orig_ld_library_path_64_saved
   global orig_dyld_library_path_saved
+  global orig_path_saved
   global orig_ld_library_path
   global orig_ld_run_path
   global orig_shlib_path
   global orig_ld_library_path_32
   global orig_ld_library_path_64
   global orig_dyld_library_path
+  global orig_path
   global env
 
   restore_gcc_exec_prefix_env_var
@@ -245,6 +261,11 @@ proc restore_ld_library_path_env_vars { } {
   } elseif [info exists env(DYLD_LIBRARY_PATH)] {
 unsetenv DYLD_LIBRARY_PATH
   }
+  if { $orig_path_saved } {
+setenv PATH "$orig_path"
+  } elseif [info exists env(PATH)] {
+unsetenv PATH
+  }
 }
 
 ###
diff --git a/libffi/testsuite/Makefile.am b/libffi/testsuite/Makefile.am
index 209e8976635..b4eb7c2bce9 100644
--- a/libffi/testsuite/Makefile.am
+++ b/libffi/testsuite/Makefile.am
@@ -82,7 +82,7 @@ libffi.call/cls_align_uint64.c libffi.call/cls_4byte.c
\
 libffi.call/cls_6

Re: [PATCH, testsuite] PR79867: Fix loading wrong DLLs on Windows, merge duplicate target-libpath.exp

2017-04-03 Thread Daniel Santos
I forgot to include PATCH and the PR in the subject line, sorry about 
that.  Also, I have run a full bootstrap and testsuite to verify that I 
haven't missed any references to the extraneous copy of 
target-libpath.exp in libffi.




Re: [testsuite] Fix loading wrong DLLs on Windows, merge duplicate target-libpath.exp

2017-04-05 Thread Daniel Santos

On 04/05/2017 12:35 PM, Mike Stump wrote:

libffi/ChangeLog:
2017-04-03  Daniel Santos 

PR testsuite/79867
* testsuite/lib/target-libpath.exp: Remove.
* testsuite/Makefile.in: Remove target-libpath.exp.
* testsuite/Makefile.am: Regenerated.

I don't think the libffi project wants to remove that file.  There is little 
point being different from them in this regard.  The dup should not hurt.


Hmm.  There have been many changes to target-libpath.exp under 
gcc/testsuite/lib since libffi copied it.  I have attached a diff of 
them.  I'm not proposing removing target-libpath.exp from libffi 
upstream, but from the gcc tree.  I'm having trouble seeing how having 
two different copies evolving independently can be a good thing.


Daniel
--- target-libpath.exp	2017-04-05 16:39:38.939768810 -0500
+++ gcc/testsuite/lib/target-libpath.exp	2017-04-05 16:39:49.350768260 -0500
@@ -1,4 +1,4 @@
-# Copyright (C) 2004, 2005, 2007 Free Software Foundation, Inc.
+# Copyright (C) 2004-2017 Free Software Foundation, Inc.
 
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -20,12 +20,28 @@
 set orig_ld_library_path_saved 0
 set orig_ld_run_path_saved 0
 set orig_shlib_path_saved 0
-set orig_ld_libraryn32_path_saved 0
-set orig_ld_library64_path_saved 0
 set orig_ld_library_path_32_saved 0
 set orig_ld_library_path_64_saved 0
 set orig_dyld_library_path_saved 0
 set orig_path_saved 0
+set orig_gcc_exec_prefix_saved 0
+set orig_gcc_exec_prefix_checked 0
+
+
+###
+# proc set_gcc_exec_prefix_env_var { }
+###
+
+proc set_gcc_exec_prefix_env_var { } {
+  global TEST_GCC_EXEC_PREFIX
+  global env
+
+  # Set GCC_EXEC_PREFIX for the compiler under test to pick up files not in
+  # the build tree from a specified location (normally the install tree).
+  if [info exists TEST_GCC_EXEC_PREFIX] {
+setenv GCC_EXEC_PREFIX "$TEST_GCC_EXEC_PREFIX"
+  }
+}
 
 ###
 # proc set_ld_library_path_env_vars { }
@@ -37,36 +53,39 @@
   global orig_ld_library_path_saved
   global orig_ld_run_path_saved
   global orig_shlib_path_saved
-  global orig_ld_libraryn32_path_saved
-  global orig_ld_library64_path_saved
   global orig_ld_library_path_32_saved
   global orig_ld_library_path_64_saved
   global orig_dyld_library_path_saved
   global orig_path_saved
+  global orig_gcc_exec_prefix_saved
+  global orig_gcc_exec_prefix_checked
   global orig_ld_library_path
   global orig_ld_run_path
   global orig_shlib_path
-  global orig_ld_libraryn32_path
-  global orig_ld_library64_path
   global orig_ld_library_path_32
   global orig_ld_library_path_64
   global orig_dyld_library_path
   global orig_path
-  global GCC_EXEC_PREFIX
+  global orig_gcc_exec_prefix
+  global env
 
-  # Set the relocated compiler prefix, but only if the user hasn't specified one.
-  if { [info exists GCC_EXEC_PREFIX] && ![info exists env(GCC_EXEC_PREFIX)] } {
-setenv GCC_EXEC_PREFIX "$GCC_EXEC_PREFIX"
+  # Save the original GCC_EXEC_PREFIX.
+  if { $orig_gcc_exec_prefix_checked == 0 } {
+if [info exists env(GCC_EXEC_PREFIX)] {
+  set orig_gcc_exec_prefix "$env(GCC_EXEC_PREFIX)"
+  set orig_gcc_exec_prefix_saved 1
+}
+set orig_gcc_exec_prefix_checked 1
   }
 
+  set_gcc_exec_prefix_env_var
+
   # Setting the ld library path causes trouble when testing cross-compilers.
   if { [is_remote target] } {
 return
   }
 
   if { $orig_environment_saved == 0 } {
-global env
-
 set orig_environment_saved 1
 
 # Save the original environment.
@@ -82,14 +101,6 @@
   set orig_shlib_path "$env(SHLIB_PATH)"
   set orig_shlib_path_saved 1
 }
-if [info exists env(LD_LIBRARYN32_PATH)] {
-  set orig_ld_libraryn32_path "$env(LD_LIBRARYN32_PATH)"
-  set orig_ld_libraryn32_path_saved 1
-}
-if [info exists env(LD_LIBRARY64_PATH)] {
-  set orig_ld_library64_path "$env(LD_LIBRARY64_PATH)"
-  set orig_ld_library64_path_saved 1
-}
 if [info exists env(LD_LIBRARY_PATH_32)] {
   set orig_ld_library_path_32 "$env(LD_LIBRARY_PATH_32)"
   set orig_ld_library_path_32_saved 1
@@ -113,12 +124,11 @@
   # It only sets SHLIB_PATH and LD_LIBRARY_PATH when it executes a
   # program.  We also need the environment set for compilations, etc.
   #
-  # On IRIX 6, we have to set variables akin to LD_LIBRARY_PATH, but
-  # called LD_LIBRARYN32_PATH (for the N32 ABI) and LD_LIBRARY64_PATH
-  # (for the 64-bit ABI).  The same applies to Darwin (DYLD_LIBRARY_PATH),
-  # Solaris 32 bit (LD_LIBRARY_PATH_32), Solaris 64 bit (LD_LIBRARY_PATH_64),
-  # and HP-UX (SHLIB_PATH).  In some cases, the variables are independent
-  # of LD_LIBRARY_PATH, and in other cases LD_LIBRARY_PATH is used if the
+  # On Darw

[PATCH v2,testsuite] PR79867: Merge fixes for windows DLL loading problem from libffi

2017-04-06 Thread Daniel Santos
We currently have two copies of target-libpath.exp in the tree under
gcc/testsuite/lib and libffi/testsuite/lib.  It was originally pulled
into the libffi project (from downstream gcc) in 2009
(https://github.com/libffi/libffi/commit/5cbe2058c128e848446ae79fe15ee54260a90559).
Then in 2012, Anthony Green (from libffi) modified it to correct this
Windows problem (and thank you:
https://github.com/libffi/libffi/commit/bd78c9c3311244dd5f877c915b0dff91621dd253).
In 2015, this file got pulled from upstream libffi back into gcc, thus
beginning two separate development paths
(https://github.com/gcc-mirror/gcc/commit/89d8a412de548b218cf7c967e65ad98bceb1ed4e).

This patch merges the changes from libffi upstream which correctly solve
the Windows DLL load path problem in set_ld_library_path_env_vars and
restore_ld_library_path_env_vars, thus fixing most PR79867.  However,
there is still incorrect behaviour in DejaGNU's unix_load that should
eventually be adddressed, although I cannot yet point to a specific
failure that it is causing.

Ultimately, I think that this functionality should be moved upstream to
DejaGNU where it can be managed more cleanly in board config files, but
we'll have to keep this code in gcc for when DejaGNU doesn't have
set/restore or push/pop libpath functionality.

Signed-off-by: Daniel Santos 
---
 gcc/testsuite/lib/target-libpath.exp | 21 +
 1 file changed, 21 insertions(+)

diff --git a/gcc/testsuite/lib/target-libpath.exp 
b/gcc/testsuite/lib/target-libpath.exp
index 9b3e201ed68..b6d01b31016 100644
--- a/gcc/testsuite/lib/target-libpath.exp
+++ b/gcc/testsuite/lib/target-libpath.exp
@@ -23,6 +23,7 @@ set orig_shlib_path_saved 0
 set orig_ld_library_path_32_saved 0
 set orig_ld_library_path_64_saved 0
 set orig_dyld_library_path_saved 0
+set orig_path_saved 0
 set orig_gcc_exec_prefix_saved 0
 set orig_gcc_exec_prefix_checked 0
 
@@ -55,6 +56,7 @@ proc set_ld_library_path_env_vars { } {
   global orig_ld_library_path_32_saved
   global orig_ld_library_path_64_saved
   global orig_dyld_library_path_saved
+  global orig_path_saved
   global orig_gcc_exec_prefix_saved
   global orig_gcc_exec_prefix_checked
   global orig_ld_library_path
@@ -63,6 +65,7 @@ proc set_ld_library_path_env_vars { } {
   global orig_ld_library_path_32
   global orig_ld_library_path_64
   global orig_dyld_library_path
+  global orig_path
   global orig_gcc_exec_prefix
   global env
 
@@ -110,6 +113,10 @@ proc set_ld_library_path_env_vars { } {
   set orig_dyld_library_path "$env(DYLD_LIBRARY_PATH)"
   set orig_dyld_library_path_saved 1
 }
+if [info exists env(PATH)] {
+  set orig_path "$env(PATH)"
+  set orig_path_saved 1
+}
   }
 
   # We need to set ld library path in the environment.  Currently,
@@ -164,6 +171,13 @@ proc set_ld_library_path_env_vars { } {
   } else {
 setenv DYLD_LIBRARY_PATH "$ld_library_path"
   }
+  if { [istarget *-*-cygwin*] || [istarget *-*-mingw*] } {
+if { $orig_path_saved } {
+  setenv PATH "$ld_library_path:$orig_path"
+} else {
+  setenv PATH "$ld_library_path"
+}
+  }
 
   verbose -log "LD_LIBRARY_PATH=[getenv LD_LIBRARY_PATH]"
   verbose -log "LD_RUN_PATH=[getenv LD_RUN_PATH]"
@@ -201,12 +215,14 @@ proc restore_ld_library_path_env_vars { } {
   global orig_ld_library_path_32_saved
   global orig_ld_library_path_64_saved
   global orig_dyld_library_path_saved
+  global orig_path_saved
   global orig_ld_library_path
   global orig_ld_run_path
   global orig_shlib_path
   global orig_ld_library_path_32
   global orig_ld_library_path_64
   global orig_dyld_library_path
+  global orig_path
   global env
 
   restore_gcc_exec_prefix_env_var
@@ -245,6 +261,11 @@ proc restore_ld_library_path_env_vars { } {
   } elseif [info exists env(DYLD_LIBRARY_PATH)] {
 unsetenv DYLD_LIBRARY_PATH
   }
+  if { $orig_path_saved } {
+setenv PATH "$orig_path"
+  } elseif [info exists env(PATH)] {
+unsetenv PATH
+  }
 }
 
 ###
-- 
2.11.0



[PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues

2017-04-27 Thread Daniel Santos
All of patches are concerned with 64-bit Microsoft ABI functions that 
call System V ABI function which clobbers RSI, RDI and XMM6-15 and are 
aimed at improving performance and .text size of Wine 64. I had 
previously submitted these as separate patch sets, but have combined 
them for simplicity. (Does this make the ChangeLogs too big? Please let 
me know if you want me to break these back apart.) Below are the 
included patchsets and a summary of changes since the previous post(s):


1.) PR78962 Use aligned SSE movs for re-aligned MS ABI pro/epilogues. 
https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01859.html


Changes:

 * The SEH unwind emit code (in winnt.c) does not currently support
   CFA_REG_EXPRESSION, which is required to make this work, so I have
   disabled it on SEH targets.
 * Updated comments on CFA_REG_EXPRESSION in winnt.c.


2.) Add option to call out-of-line stubs instead of emitting inline 
saves and restores. https://gcc.gnu.org/ml/gcc-patches/2017-02/msg00548.html


Changes:

 * Renamed option from -moutline-msabi-xlogues to -mcall-ms2sysv-xlogues
 * Since this patch set depends upon aligned SSE MOVs after stack
   realignment, I have disabled it on SEH targets with a sorry().
 * I was previously trying to cache the rtx for symbols to the libgcc
   stubs instead of creating new ones, but this caused problems in
   subsequent passes and it was disabled with a "TODO" comment. I have
   removed this code, as well as the rtx cache that was just wasting
   memory in class xlogue_layout.
 * Improved comment documentation.


3.) A comprehensive test program to validate correct behavior in these 
pro- and epilogues. https://gcc.gnu.org/ml/gcc-patches/2017-02/msg00542.html


Changes:

 * The previous version repeated all tests for each -j instead of
   running in parallel. I have fixed this implementing a primitive but
   effective file-based parallelization scheme.
 * I noticed that there was gcc/testsuite/gcc.target/x86_64/abi
   directory for tests specific to testing 64-bit abi issues, so I've
   moved my tests to an "ms-sysv" subdirectory of that (instead of
   gcc/testsuite/gcc.target/i386/msabi).
 * Fixed breakages on Cygwin.
 * Corrected a bad "_noinfo" optimization barrier (function call by
   volatile pointer).
 * Minor cleanup/improvements.


 gcc/Makefile.in|   2 +
 gcc/config/i386/i386.c | 916 +++--
 gcc/config/i386/i386.h |  33 +-
 gcc/config/i386/i386.opt   |   4 +
 gcc/config/i386/predicates.md  | 155 
 gcc/config/i386/sse.md |  37 +
 gcc/config/i386/winnt.c|   3 +-
 gcc/doc/invoke.texi|  13 +-
 .../gcc.target/x86_64/abi/ms-sysv/do-test.S| 163 
 gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc | 807 ++
 .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.c| 373 +
 .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp  | 178 
 libgcc/config.host |   2 +-
 libgcc/config/i386/i386-asm.h  |  82 ++
 libgcc/config/i386/resms64.S   |  57 ++
 libgcc/config/i386/resms64f.S  |  55 ++
 libgcc/config/i386/resms64fx.S |  57 ++
 libgcc/config/i386/resms64x.S  |  59 ++
 libgcc/config/i386/savms64.S   |  57 ++
 libgcc/config/i386/savms64f.S  |  55 ++
 libgcc/config/i386/t-msabi |   7 +
 21 files changed, 3020 insertions(+), 95 deletions(-)  


gcc/ChangeLog:

2017-04-25  Daniel Santos

* config/i386/i386.opt: Add option -mcall-ms2sysv-xlogues.
* config/i386/i386.h
(x86_64_ms_sysv_extra_clobbered_registers): Change type to unsigned.
(NUM_X86_64_MS_CLOBBERED_REGS): New macro.
(struct machine_function): Add new members call_ms2sysv,
call_ms2sysv_pad_in, call_ms2sysv_pad_out and call_ms2sysv_extra_regs.
(struct machine_frame_state): New fields sp_realigned and
sp_realigned_offset.
* config/i386/i386.c
(enum xlogue_stub): New enum.
(enum xlogue_stub_sets): New enum.
(class xlogue_layout): New class.
(struct ix86_frame): New fields stack_realign_allocate_offset,
stack_realign_offset and outlined_save_offset.  Modify comments to
detail stack layout when using out-of-line stubs.
(ix86_target_string): Add -mcall-ms2sysv-xlogues option.
(ix86_option_override_internal): Add sorry() for TARGET_SEH and
-mcall-ms2sysv-xlogues.
(stub_managed_regs): New static variable.
(ix86_save_reg): Add new parameter ignore_outlined to optionally omit
registers managed by out-of-line stub.
(disable_call_ms2sysv_xlogues): New function.
(ix

[PATCH 02/12] [i386] Keep stack pointer valid after after re-alignment.

2017-04-27 Thread Daniel Santos
Add the fields sp_realigned and sp_realigned_offset to struct
machine_frame_state.  We now have the concept of the stack pointer being
re-aligned rather than invalid.  The inline functions sp_valid_at and
fp_valid_at are added to test if a given location relative to the CFA
can be accessed with the stack or frame pointer, respectively.

Stack allocation prior to re-alignment is modified so that we allocate
what is needed, but don't allocate unneeded space in the event that no
SSE registers are saved, but frame.sse_reg_save_offset is increased for
alignment.

As this change only alters how SSE registers are saved, moving the
re-alignment AND should not hinder parallelization of int register saves.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 74 +-
 gcc/config/i386/i386.h | 11 
 2 files changed, 66 insertions(+), 19 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 31f69c92968..7923486157d 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12783,6 +12783,24 @@ choose_baseaddr_len (unsigned int regno, HOST_WIDE_INT 
offset)
   return len;
 }
 
+/* Determine if the stack pointer is valid for accessing the cfa_offset.  */
+
+static inline bool sp_valid_at (HOST_WIDE_INT cfa_offset)
+{
+  const struct machine_frame_state &fs = cfun->machine->fs;
+  return fs.sp_valid && !(fs.sp_realigned
+ && cfa_offset < fs.sp_realigned_offset);
+}
+
+/* Determine if the frame pointer is valid for accessing the cfa_offset.  */
+
+static inline bool fp_valid_at (HOST_WIDE_INT cfa_offset)
+{
+  const struct machine_frame_state &fs = cfun->machine->fs;
+  return fs.fp_valid && !(fs.sp_valid && fs.sp_realigned
+ && cfa_offset >= fs.sp_realigned_offset);
+}
+
 /* Return an RTX that points to CFA_OFFSET within the stack frame.
The valid base registers are taken from CFUN->MACHINE->FS.  */
 
@@ -13081,15 +13099,18 @@ pro_epilogue_adjust_stack (rtx dest, rtx src, rtx 
offset,
 {
   HOST_WIDE_INT ooffset = m->fs.sp_offset;
   bool valid = m->fs.sp_valid;
+  bool realigned = m->fs.sp_realigned;
 
   if (src == hard_frame_pointer_rtx)
{
  valid = m->fs.fp_valid;
+ realigned = false;
  ooffset = m->fs.fp_offset;
}
   else if (src == crtl->drap_reg)
{
  valid = m->fs.drap_valid;
+ realigned = false;
  ooffset = 0;
}
   else
@@ -13103,6 +13124,7 @@ pro_epilogue_adjust_stack (rtx dest, rtx src, rtx 
offset,
 
   m->fs.sp_offset = ooffset - INTVAL (offset);
   m->fs.sp_valid = valid;
+  m->fs.sp_realigned = realigned;
 }
 }
 
@@ -13852,6 +13874,7 @@ ix86_expand_prologue (void)
  this is fudged; we're interested to offsets within the local frame.  */
   m->fs.sp_offset = INCOMING_FRAME_SP_OFFSET;
   m->fs.sp_valid = true;
+  m->fs.sp_realigned = false;
 
   ix86_compute_frame_layout (&frame);
 
@@ -14068,11 +14091,10 @@ ix86_expand_prologue (void)
 that we must allocate the size of the register save area before
 performing the actual alignment.  Otherwise we cannot guarantee
 that there's enough storage above the realignment point.  */
-  if (m->fs.sp_offset != frame.sse_reg_save_offset)
+  allocate = frame.stack_realign_allocate_offset - m->fs.sp_offset;
+  if (allocate)
 pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
-  GEN_INT (m->fs.sp_offset
-   - frame.sse_reg_save_offset),
-  -1, false);
+  GEN_INT (-allocate), -1, false);
 
   /* Align the stack.  */
   insn = emit_insn (ix86_gen_andsp (stack_pointer_rtx,
@@ -14080,11 +14102,19 @@ ix86_expand_prologue (void)
GEN_INT (-align_bytes)));
 
   /* For the purposes of register save area addressing, the stack
- pointer is no longer valid.  As for the value of sp_offset,
-see ix86_compute_frame_layout, which we need to match in order
-to pass verification of stack_pointer_offset at the end.  */
+pointer can no longer be used to access anything in the frame
+below m->fs.sp_realigned_offset and the frame pointer cannot be
+used for anything at or above.  */
   m->fs.sp_offset = ROUND_UP (m->fs.sp_offset, align_bytes);
-  m->fs.sp_valid = false;
+  m->fs.sp_realigned = true;
+  m->fs.sp_realigned_offset = m->fs.sp_offset - frame.nsseregs * 16;
+  gcc_assert (m->fs.sp_realigned_offset == frame.stack_realign_offset);
+  /* SEH unwind emit doesn't currently support REG_CFA_EXPRESSION, which
+is needed to des

[PATCH 01/12] [i386] Re-align stack frame prior to SSE saves.

2017-04-27 Thread Daniel Santos
Add new fields to struct ix86_frame to track where we started the stack
re-alignment and what we need to allocate prior to re-alignment.  In
ix86_compute_frame_layout, we do the stack frame re-alignment
computation prior to computing the SSE save area so that it we have an
aligned SSE save area.

This new also assures that the SSE save area is properly aligned when
DRAP is used.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 40 +---
 1 file changed, 25 insertions(+), 15 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d9856573db7..31f69c92968 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2455,7 +2455,7 @@ struct GTY(()) stack_local_entry {
[saved regs]
<- regs_save_offset
[padding0]
-
+   <- stack_realign_offset
[saved SSE regs]
<- sse_regs_save_offset
[padding1]  |
@@ -2481,6 +2481,8 @@ struct ix86_frame
   HOST_WIDE_INT stack_pointer_offset;
   HOST_WIDE_INT hfp_save_offset;
   HOST_WIDE_INT reg_save_offset;
+  HOST_WIDE_INT stack_realign_allocate_offset;
+  HOST_WIDE_INT stack_realign_offset;
   HOST_WIDE_INT sse_reg_save_offset;
 
   /* When save_regs_using_mov is set, emit prologue using
@@ -12636,28 +12638,36 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
   if (TARGET_SEH)
 frame->hard_frame_pointer_offset = offset;
 
+  /* When re-aligning the stack frame, but not saving SSE registers, this
+ is the offset we want adjust the stack pointer to.  */
+  frame->stack_realign_allocate_offset = offset;
+
+  /* The re-aligned stack starts here.  Values before this point are not
+ directly comparable with values below this point.  Use sp_valid_at
+ to determine if the stack pointer is valid for a given offset and
+ fp_valid_at for the frame pointer.  */
+  if (stack_realign_fp)
+offset = ROUND_UP (offset, stack_alignment_needed);
+  frame->stack_realign_offset = offset;
+
   /* Align and set SSE register save area.  */
   if (frame->nsseregs)
 {
   /* The only ABI that has saved SSE registers (Win64) also has a
-16-byte aligned default stack, and thus we don't need to be
-within the re-aligned local stack frame to save them.  In case
-incoming stack boundary is aligned to less than 16 bytes,
-unaligned move of SSE register will be emitted, so there is
-no point to round up the SSE register save area outside the
-re-aligned local stack frame to 16 bytes.  */
-  if (ix86_incoming_stack_boundary >= 128)
+16-byte aligned default stack.  However, many programs violate
+the ABI, and Wine64 forces stack realignment to compensate.
+
+If the incoming stack boundary is at least 16 bytes, or DRAP is
+required and the DRAP re-alignment boundary is at least 16 bytes,
+then we want the SSE register save area properly aligned.  */
+  if (ix86_incoming_stack_boundary >= 128
+  || (stack_realign_drap && stack_alignment_needed >= 16))
offset = ROUND_UP (offset, 16);
   offset += frame->nsseregs * 16;
+  frame->stack_realign_allocate_offset = offset;
 }
-  frame->sse_reg_save_offset = offset;
 
-  /* The re-aligned stack starts here.  Values before this point are not
- directly comparable with values below this point.  In order to make
- sure that no value happens to be the same before and after, force
- the alignment computation below to add a non-zero value.  */
-  if (stack_realign_fp)
-offset = ROUND_UP (offset, stack_alignment_needed);
+  frame->sse_reg_save_offset = offset;
 
   /* Va-arg area */
   frame->va_arg_size = ix86_varargs_gpr_size + ix86_varargs_fpr_size;
-- 
2.11.0



[PATCH 04/12] [i386] Minor refactoring

2017-04-27 Thread Daniel Santos
For the sake of clarity, I've separated out these minor refactoring
changes from the remainder of this patch set.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 21 ++---
 gcc/config/i386/i386.h |  4 +++-
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index e8a4ba6fe8d..113f83742c2 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2424,7 +2424,7 @@ static int const x86_64_int_return_registers[4] =
 
 /* Additional registers that are clobbered by SYSV calls.  */
 
-int const x86_64_ms_sysv_extra_clobbered_registers[12] =
+unsigned const x86_64_ms_sysv_extra_clobbered_registers[12] =
 {
   SI_REG, DI_REG,
   XMM6_REG, XMM7_REG,
@@ -12539,6 +12539,7 @@ ix86_builtin_setjmp_frame_value (void)
 static void
 ix86_compute_frame_layout (struct ix86_frame *frame)
 {
+  struct machine_function *m = cfun->machine;
   unsigned HOST_WIDE_INT stack_alignment_needed;
   HOST_WIDE_INT offset;
   unsigned HOST_WIDE_INT preferred_alignment;
@@ -12573,19 +12574,19 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
  scheduling that can be done, which means that there's very little point
  in doing anything except PUSHs.  */
   if (TARGET_SEH)
-cfun->machine->use_fast_prologue_epilogue = false;
+m->use_fast_prologue_epilogue = false;
 
   /* During reload iteration the amount of registers saved can change.
  Recompute the value as needed.  Do not recompute when amount of registers
  didn't change as reload does multiple calls to the function and does not
  expect the decision to change within single iteration.  */
   else if (!optimize_bb_for_size_p (ENTRY_BLOCK_PTR_FOR_FN (cfun))
-   && cfun->machine->use_fast_prologue_epilogue_nregs != frame->nregs)
+  && m->use_fast_prologue_epilogue_nregs != frame->nregs)
 {
   int count = frame->nregs;
   struct cgraph_node *node = cgraph_node::get (current_function_decl);
 
-  cfun->machine->use_fast_prologue_epilogue_nregs = count;
+  m->use_fast_prologue_epilogue_nregs = count;
 
   /* The fast prologue uses move instead of push to save registers.  This
  is significantly longer, but also executes faster as modern hardware
@@ -12602,14 +12603,14 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
   if (node->frequency < NODE_FREQUENCY_NORMAL
  || (flag_branch_probabilities
  && node->frequency < NODE_FREQUENCY_HOT))
-cfun->machine->use_fast_prologue_epilogue = false;
+   m->use_fast_prologue_epilogue = false;
   else
-cfun->machine->use_fast_prologue_epilogue
+   m->use_fast_prologue_epilogue
   = !expensive_function_p (count);
 }
 
   frame->save_regs_using_mov
-= (TARGET_PROLOGUE_USING_MOVE && cfun->machine->use_fast_prologue_epilogue
+= (TARGET_PROLOGUE_USING_MOVE && m->use_fast_prologue_epilogue
/* If static stack checking is enabled and done with probes,
  the registers need to be saved before allocating the frame.  */
&& flag_stack_check != STATIC_BUILTIN_STACK_CHECK);
@@ -28683,11 +28684,9 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
   else if (TARGET_64BIT_MS_ABI
   && (!callarg2 || INTVAL (callarg2) != -2))
 {
-  int const cregs_size
-   = ARRAY_SIZE (x86_64_ms_sysv_extra_clobbered_registers);
-  int i;
+  unsigned i;
 
-  for (i = 0; i < cregs_size; i++)
+  for (i = 0; i < NUM_X86_64_MS_CLOBBERED_REGS; i++)
{
  int regno = x86_64_ms_sysv_extra_clobbered_registers[i];
  machine_mode mode = SSE_REGNO_P (regno) ? TImode : DImode;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 4e4cb7ca7e3..645b239a768 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2163,7 +2163,9 @@ extern int const dbx_register_map[FIRST_PSEUDO_REGISTER];
 extern int const dbx64_register_map[FIRST_PSEUDO_REGISTER];
 extern int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER];
 
-extern int const x86_64_ms_sysv_extra_clobbered_registers[12];
+extern unsigned const x86_64_ms_sysv_extra_clobbered_registers[12];
+#define NUM_X86_64_MS_CLOBBERED_REGS \
+  (ARRAY_SIZE (x86_64_ms_sysv_extra_clobbered_registers))
 
 /* Before the prologue, RA is at 0(%esp).  */
 #define INCOMING_RETURN_ADDR_RTX \
-- 
2.11.0



[PATCH 03/12] [i386] Use re-aligned stack pointer for aligned SSE movs

2017-04-27 Thread Daniel Santos
Add an optional `align' parameter to choose_baseaddr, allowing the
caller to request an address that is aligned to some boundary.  Modify
ix86_emit_save_regs_using_mov and ix86_emit_restore_regs_using_mov use
optimally aligned memory when such a base register is available.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c  | 111 ++--
 gcc/config/i386/winnt.c |   3 +-
 2 files changed, 90 insertions(+), 24 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7923486157d..e8a4ba6fe8d 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12801,15 +12801,39 @@ static inline bool fp_valid_at (HOST_WIDE_INT 
cfa_offset)
  && cfa_offset >= fs.sp_realigned_offset);
 }
 
-/* Return an RTX that points to CFA_OFFSET within the stack frame.
-   The valid base registers are taken from CFUN->MACHINE->FS.  */
+/* Choose a base register based upon alignment requested, speed and/or
+   size.  */
 
-static rtx
-choose_baseaddr (HOST_WIDE_INT cfa_offset)
+static void choose_basereg (HOST_WIDE_INT cfa_offset, rtx &base_reg,
+   HOST_WIDE_INT &base_offset,
+   unsigned int align_reqested, unsigned int *align)
 {
   const struct machine_function *m = cfun->machine;
-  rtx base_reg = NULL;
-  HOST_WIDE_INT base_offset = 0;
+  unsigned int hfp_align;
+  unsigned int drap_align;
+  unsigned int sp_align;
+  bool hfp_ok  = fp_valid_at (cfa_offset);
+  bool drap_ok = m->fs.drap_valid;
+  bool sp_ok   = sp_valid_at (cfa_offset);
+
+  hfp_align = drap_align = sp_align = INCOMING_STACK_BOUNDARY;
+
+  /* Filter out any registers that don't meet the requested alignment
+ criteria.  */
+  if (align_reqested)
+{
+  if (m->fs.realigned)
+   hfp_align = drap_align = sp_align = crtl->stack_alignment_needed;
+  /* SEH unwind code does do not currently support REG_CFA_EXPRESSION
+notes (which we would need to use a realigned stack pointer),
+so disable on SEH targets.  */
+  else if (m->fs.sp_realigned)
+   sp_align = crtl->stack_alignment_needed;
+
+  hfp_ok = hfp_ok && hfp_align >= align_reqested;
+  drap_ok = drap_ok && drap_align >= align_reqested;
+  sp_ok = sp_ok && sp_align >= align_reqested;
+}
 
   if (m->use_fast_prologue_epilogue)
 {
@@ -12818,17 +12842,17 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset)
  while DRAP must be reloaded within the epilogue.  But choose either
  over the SP due to increased encoding size.  */
 
-  if (m->fs.fp_valid)
+  if (hfp_ok)
{
  base_reg = hard_frame_pointer_rtx;
  base_offset = m->fs.fp_offset - cfa_offset;
}
-  else if (m->fs.drap_valid)
+  else if (drap_ok)
{
  base_reg = crtl->drap_reg;
  base_offset = 0 - cfa_offset;
}
-  else if (m->fs.sp_valid)
+  else if (sp_ok)
{
  base_reg = stack_pointer_rtx;
  base_offset = m->fs.sp_offset - cfa_offset;
@@ -12841,13 +12865,13 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset)
 
   /* Choose the base register with the smallest address encoding.
  With a tie, choose FP > DRAP > SP.  */
-  if (m->fs.sp_valid)
+  if (sp_ok)
{
  base_reg = stack_pointer_rtx;
  base_offset = m->fs.sp_offset - cfa_offset;
   len = choose_baseaddr_len (STACK_POINTER_REGNUM, base_offset);
}
-  if (m->fs.drap_valid)
+  if (drap_ok)
{
  toffset = 0 - cfa_offset;
  tlen = choose_baseaddr_len (REGNO (crtl->drap_reg), toffset);
@@ -12858,7 +12882,7 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset)
  len = tlen;
}
}
-  if (m->fs.fp_valid)
+  if (hfp_ok)
{
  toffset = m->fs.fp_offset - cfa_offset;
  tlen = choose_baseaddr_len (HARD_FRAME_POINTER_REGNUM, toffset);
@@ -12870,8 +12894,40 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset)
}
}
 }
-  gcc_assert (base_reg != NULL);
 
+/* Set the align return value.  */
+if (align)
+  {
+   if (base_reg == stack_pointer_rtx)
+ *align = sp_align;
+   else if (base_reg == crtl->drap_reg)
+ *align = drap_align;
+   else if (base_reg == hard_frame_pointer_rtx)
+ *align = hfp_align;
+  }
+}
+
+/* Return an RTX that points to CFA_OFFSET within the stack frame and
+   the alignment of address.  If align is non-null, it should point to
+   an alignment value (in bits) that is preferred or zero and will
+   recieve the alignment of the base register that was selected.  The
+   valid base registers are taken from CFUN->MACHINE->FS.  */
+
+static rtx
+choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align)
+{
+  rtx base_reg = NULL;
+  HOST_W

[PATCH 05/12] [i386] Add option -mcall-ms2sysv-xlogues

2017-04-27 Thread Daniel Santos
Adds the options -mcall-ms2sysv-xlogues to i386.opt and i386.c and
documentation to invoke.texi.  Using -mcall-ms2sysv-xlogues on SEH
targets is currently unsupported and will result in a sorry ().  SEH
targets can be supported, but it would require adding support for
CFA_REG_EXPRESSION to the SEH unwind emit code in
gcc/config/i386/winnt.c -- this is the same for use of aligned SSE MOVs
after a realigned stack pointer.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c   |  6 +-
 gcc/config/i386/i386.opt |  4 
 gcc/doc/invoke.texi  | 13 -
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 113f83742c2..521116195cb 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -4508,7 +4508,8 @@ ix86_target_string (HOST_WIDE_INT isa, HOST_WIDE_INT isa2,
 { "-mstv", MASK_STV },
 { "-mavx256-split-unaligned-load", MASK_AVX256_SPLIT_UNALIGNED_LOAD },
 { "-mavx256-split-unaligned-store",
MASK_AVX256_SPLIT_UNALIGNED_STORE },
-{ "-mprefer-avx128",   MASK_PREFER_AVX128 }
+{ "-mprefer-avx128",   MASK_PREFER_AVX128 },
+{ "-mcall-ms2sysv-xlogues",MASK_CALL_MS2SYSV_XLOGUES }
   };
 
   /* Additional flag options.  */
@@ -6319,6 +6320,9 @@ ix86_option_override_internal (bool main_args_p,
 #endif
}
 
+  if (TARGET_SEH && TARGET_CALL_MS2SYSV_XLOGUES)
+sorry ("-mcall-ms2sysv-xlogues isn%'t currently supported with SEH");
+
   if (!(opts_set->x_target_flags & MASK_VZEROUPPER))
 opts->x_target_flags |= MASK_VZEROUPPER;
   if (!(opts_set->x_target_flags & MASK_STV))
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 9384e29b1de..65b228544a5 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -538,6 +538,10 @@ Enum(calling_abi) String(sysv) Value(SYSV_ABI)
 EnumValue
 Enum(calling_abi) String(ms) Value(MS_ABI)
 
+mcall-ms2sysv-xlogues
+Target Report Mask(CALL_MS2SYSV_XLOGUES) Save
+Use libgcc stubs to save and restore registers clobbered by 64-bit Microsoft 
to System V ABI calls.
+
 mveclibabi=
 Target RejectNegative Joined Var(ix86_veclibabi_type) Enum(ix86_veclibabi) 
Init(ix86_veclibabi_type_none)
 Vector library ABI to use.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 0eeea7b3b87..c9e565a9216 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1209,7 +1209,7 @@ See RS/6000 and PowerPC Options.
 -msse2avx  -mfentry  -mrecord-mcount  -mnop-mcount  -m8bit-idiv @gol
 -mavx256-split-unaligned-load  -mavx256-split-unaligned-store @gol
 -malign-data=@var{type}  -mstack-protector-guard=@var{guard} @gol
--mmitigate-rop  -mgeneral-regs-only}
+-mmitigate-rop  -mgeneral-regs-only  -mcall-ms2sysv-xlogues}
 
 @emph{x86 Windows Options}
 @gccoptlist{-mconsole  -mcygwin  -mno-cygwin  -mdll @gol
@@ -25308,6 +25308,17 @@ You can control this behavior for specific functions by
 using the function attributes @code{ms_abi} and @code{sysv_abi}.
 @xref{Function Attributes}.
 
+@item -mcall-ms2sysv-xlogues
+@opindex mcall-ms2sysv-xlogues
+@opindex mno-call-ms2sysv-xlogues
+Due to differences in 64-bit ABIs, any Microsoft ABI function that calls a
+System V ABI function must consider RSI, RDI and XMM6-15 as clobbered.  By
+default, the code for saving and restoring these registers is emitted inline,
+resulting in fairly lengthy prologues and epilogues.  Using
+@option{-mcall-ms2sysv-xlogues} emits prologues and epilogues that
+use stubs in the static portion of libgcc to perform these saves & restores,
+thus reducing function size at the cost of a few extra instructions.
+
 @item -mtls-dialect=@var{type}
 @opindex mtls-dialect
 Generate code to access thread-local storage using the @samp{gnu} or
-- 
2.11.0



[PATCH 09/12] [i386] Add patterns and predicates foutline-msabi-xlouges

2017-04-27 Thread Daniel Santos
Adds the predicates save_multiple and restore_multiple to predicates.md,
which are used by following patterns in sse.md:

* save_multiple - insn that calls a save stub
* restore_multiple - call_insn that calls a save stub and returns to the
  function to allow a sibling call (which should typically offer better
  optimization than the restore stub as the tail call)
* restore_multiple_and_return - a jump_insn that returns from the
  function as a tail-call.
* restore_multiple_leave_return - like the above, but restores the frame
  pointer before returning.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/predicates.md | 155 ++
 gcc/config/i386/sse.md|  37 ++
 2 files changed, 192 insertions(+)

diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 8f250a2e720..36fe8abc3f4 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -1657,3 +1657,158 @@
   (ior (match_operand 0 "register_operand")
(and (match_code "const_int")
(match_test "op == constm1_rtx"
+
+;; Return true if:
+;; 1. first op is a symbol reference,
+;; 2. >= 13 operands, and
+;; 3. operands 2 to end is one of:
+;;   a. save a register to a memory location, or
+;;   b. restore stack pointer.
+(define_predicate "save_multiple"
+  (match_code "parallel")
+{
+  const unsigned nregs = XVECLEN (op, 0);
+  rtx head = XVECEXP (op, 0, 0);
+  unsigned i;
+
+  if (GET_CODE (head) != USE)
+return false;
+  else
+{
+  rtx op0 = XEXP (head, 0);
+  if (op0 == NULL_RTX || GET_CODE (op0) != SYMBOL_REF)
+   return false;
+}
+
+  if (nregs < 13)
+return false;
+
+  for (i = 2; i < nregs; i++)
+{
+  rtx e, src, dest;
+
+  e = XVECEXP (op, 0, i);
+
+  switch (GET_CODE (e))
+   {
+ case SET:
+   src  = SET_SRC (e);
+   dest = SET_DEST (e);
+
+   /* storing a register to memory.  */
+   if (GET_CODE (src) == REG && GET_CODE (dest) == MEM)
+ {
+   rtx addr = XEXP (dest, 0);
+
+   /* Good if dest address is in RAX.  */
+   if (GET_CODE (addr) == REG
+   && REGNO (addr) == AX_REG)
+ continue;
+
+   /* Good if dest address is offset of RAX.  */
+   if (GET_CODE (addr) == PLUS
+   && GET_CODE (XEXP (addr, 0)) == REG
+   && REGNO (XEXP (addr, 0)) == AX_REG)
+ continue;
+ }
+   break;
+
+ default:
+   break;
+   }
+   return false;
+}
+  return true;
+})
+
+;; Return true if:
+;; * first op is (return) or a a use (symbol reference),
+;; * >= 14 operands, and
+;; * operands 2 to end are one of:
+;;   - restoring a register from a memory location that's an offset of RSI.
+;;   - clobbering a reg
+;;   - adjusting SP
+(define_predicate "restore_multiple"
+  (match_code "parallel")
+{
+  const unsigned nregs = XVECLEN (op, 0);
+  rtx head = XVECEXP (op, 0, 0);
+  unsigned i;
+
+  switch (GET_CODE (head))
+{
+  case RETURN:
+   i = 3;
+   break;
+
+  case USE:
+  {
+   rtx op0 = XEXP (head, 0);
+
+   if (op0 == NULL_RTX || GET_CODE (op0) != SYMBOL_REF)
+ return false;
+
+   i = 1;
+   break;
+  }
+
+  default:
+   return false;
+}
+
+  if (nregs < i + 12)
+return false;
+
+  for (; i < nregs; i++)
+{
+  rtx e, src, dest;
+
+  e = XVECEXP (op, 0, i);
+
+  switch (GET_CODE (e))
+   {
+ case CLOBBER:
+   continue;
+
+ case SET:
+   src  = SET_SRC (e);
+   dest = SET_DEST (e);
+
+   /* Restoring a register from memory.  */
+   if (GET_CODE (src) == MEM && GET_CODE (dest) == REG)
+ {
+   rtx addr = XEXP (src, 0);
+
+   /* Good if src address is in RSI.  */
+   if (GET_CODE (addr) == REG
+   && REGNO (addr) == SI_REG)
+ continue;
+
+   /* Good if src address is offset of RSI.  */
+   if (GET_CODE (addr) == PLUS
+   && GET_CODE (XEXP (addr, 0)) == REG
+   && REGNO (XEXP (addr, 0)) == SI_REG)
+ continue;
+
+   /* Good if adjusting stack pointer.  */
+   if (GET_CODE (dest) == REG
+   && REGNO (dest) == SP_REG
+   && GET_CODE (src) == PLUS
+   && GET_CODE (XEXP (src, 0)) == REG
+   && REGNO (XEXP (src, 0)) == SP_REG)
+ continue;
+ }
+
+   /* Restoring stack pointer from another register.  */
+   if (GET_CODE (dest) == REG && REGNO (dest) == SP_REG
+  

[PATCH 10/12] [i386] Add ms2sysv pro/epilogue stubs to libgcc

2017-04-27 Thread Daniel Santos
Add new header libgcc/config/i386/i386-asm.h to manage common cpp and
gas macros.  Add new stubs.  Stubs use the following naming convention:

  __ms64[f][x]_

   Save or restore
ms64Avoid possible name collisions with future stubs
(specific to 64-bit msabi --> sysv scenario)
[f] Variant for hard frame pointer (and stack realignment)
[x] Tail-call variant (is the return from function)
 The number of registers to save.

Signed-off-by: Daniel Santos 
---
 libgcc/config.host |  2 +-
 libgcc/config/i386/i386-asm.h  | 82 ++
 libgcc/config/i386/resms64.S   | 57 +
 libgcc/config/i386/resms64f.S  | 55 
 libgcc/config/i386/resms64fx.S | 57 +
 libgcc/config/i386/resms64x.S  | 59 ++
 libgcc/config/i386/savms64.S   | 57 +
 libgcc/config/i386/savms64f.S  | 55 
 libgcc/config/i386/t-msabi |  7 
 9 files changed, 430 insertions(+), 1 deletion(-)
 create mode 100644 libgcc/config/i386/i386-asm.h
 create mode 100644 libgcc/config/i386/resms64.S
 create mode 100644 libgcc/config/i386/resms64f.S
 create mode 100644 libgcc/config/i386/resms64fx.S
 create mode 100644 libgcc/config/i386/resms64x.S
 create mode 100644 libgcc/config/i386/savms64.S
 create mode 100644 libgcc/config/i386/savms64f.S
 create mode 100644 libgcc/config/i386/t-msabi

diff --git a/libgcc/config.host b/libgcc/config.host
index b279a6458f9..b6d10951f3f 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1351,7 +1351,7 @@ case ${host} in
 i[34567]86-*-linux* | x86_64-*-linux* | \
   i[34567]86-*-kfreebsd*-gnu | x86_64-*-kfreebsd*-gnu | \
   i[34567]86-*-gnu*)
-   tmake_file="${tmake_file} t-tls i386/t-linux t-slibgcc-libgcc"
+   tmake_file="${tmake_file} t-tls i386/t-linux i386/t-msabi 
t-slibgcc-libgcc"
if test "$libgcc_cv_cfi" = "yes"; then
tmake_file="${tmake_file} t-stack i386/t-stack-i386"
fi
diff --git a/libgcc/config/i386/i386-asm.h b/libgcc/config/i386/i386-asm.h
new file mode 100644
index 000..c613e9fd83d
--- /dev/null
+++ b/libgcc/config/i386/i386-asm.h
@@ -0,0 +1,82 @@
+/* Defines common perprocessor and assembly macros for use by various stubs.
+   Copyright (C) 2016-2017 Free Software Foundation, Inc.
+   Contributed by Daniel Santos 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef I386_ASM_H
+#define I386_ASM_H
+
+#ifdef __ELF__
+# define ELFFN(fn) .type fn,@function
+#else
+# define ELFFN(fn)
+#endif
+
+#define FUNC_START(fn) \
+   .global fn; \
+   ELFFN (fn); \
+fn:
+
+#define HIDDEN_FUNC(fn)\
+   FUNC_START (fn) \
+   .hidden fn; \
+
+#define FUNC_END(fn) .size fn,.-fn
+
+#ifdef __SSE2__
+# ifdef __AVX__
+#  define MOVAPS vmovaps
+# else
+#  define MOVAPS movaps
+# endif
+
+/* Save SSE registers 6-15. off is the offset of rax to get to xmm6.  */
+.macro SSE_SAVE off=0
+   MOVAPS %xmm15,(\off - 0x90)(%rax)
+   MOVAPS %xmm14,(\off - 0x80)(%rax)
+   MOVAPS %xmm13,(\off - 0x70)(%rax)
+   MOVAPS %xmm12,(\off - 0x60)(%rax)
+   MOVAPS %xmm11,(\off - 0x50)(%rax)
+   MOVAPS %xmm10,(\off - 0x40)(%rax)
+   MOVAPS %xmm9, (\off - 0x30)(%rax)
+   MOVAPS %xmm8, (\off - 0x20)(%rax)
+   MOVAPS %xmm7, (\off - 0x10)(%rax)
+   MOVAPS %xmm6, \off(%rax)
+.endm
+
+/* Restore SSE registers 6-15. off is the offset of rsi to get to xmm6.  */
+.macro SSE_RESTORE off=0
+   MOVAPS (\off - 0x90)(%rsi), %xmm15
+   MOVAPS (\off - 0x80)(%rsi), %xmm14
+   MOVAPS (\off - 0x70)(%rsi), %xmm13
+   MOVAPS (\off - 0x60)(%rsi), %xmm12
+   MOVAPS (\off - 0x50)(%rsi), %xmm11
+   MOVAPS (\off - 0x40)(%rsi), %xmm10
+   MOVAPS (\off - 0x30)(%rsi), %xmm9
+   MOVAPS (\off - 0x20)(%rsi), %xmm8
+   MOVAPS (\off - 0x10)(%rsi), %xmm7
+   MOVAPS \off(%rsi), %xmm6
+.endm
+
+#endif /* __SSE2__ */
+#endif /* I386_

[PATCH 08/12] [i386] Modify ix86_compute_frame_layout for -mcall-ms2sysv-xlogues

2017-04-27 Thread Daniel Santos
ix86_compute_frame_layout will now populate fields added to structs
machine_function and ix86_frame and modify the frame layout specifics to
facilitate the use of save & restore stubs.  This is also where we init
stub_managed_regs to track which register saves & restores are being
managed by the out-of-line stub and which are being managed inline, as
it is possible to have registers being managed both inline and
out-of-line when inline asm explicitly clobbers a register.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 94 +++---
 1 file changed, 90 insertions(+), 4 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4f0cb7dd6cc..debfe457d97 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2715,12 +2715,29 @@ struct GTY(()) stack_local_entry {
saved frame pointer if frame_pointer_needed
<- HARD_FRAME_POINTER
[saved regs]
-   <- regs_save_offset
+   <- reg_save_offset
[padding0]
<- stack_realign_offset
[saved SSE regs]
+   OR
+   [stub-saved registers for ms x64 --> sysv clobbers
+   <- Start of out-of-line, stub-saved/restored regs
+  (see libgcc/config/i386/(sav|res)ms64*.S)
+ [XMM6-15]
+ [RSI]
+ [RDI]
+ [?RBX]only if RBX is clobbered
+ [?RBP]only if RBP and RBX are clobbered
+ [?R12]only if R12 and all previous regs are clobbered
+ [?R13]only if R13 and all previous regs are clobbered
+ [?R14]only if R14 and all previous regs are clobbered
+ [?R15]only if R15 and all previous regs are clobbered
+   <- end of stub-saved/restored regs
+ [padding1]
+   ]
+   <- outlined_save_offset
<- sse_regs_save_offset
-   [padding1]  |
+   [padding2]
   |<- FRAME_POINTER
[va_arg registers]  |
   |
@@ -2745,6 +2762,7 @@ struct ix86_frame
   HOST_WIDE_INT reg_save_offset;
   HOST_WIDE_INT stack_realign_allocate_offset;
   HOST_WIDE_INT stack_realign_offset;
+  HOST_WIDE_INT outlined_save_offset;
   HOST_WIDE_INT sse_reg_save_offset;
 
   /* When save_regs_using_mov is set, emit prologue using
@@ -12802,6 +12820,15 @@ ix86_builtin_setjmp_frame_value (void)
   return stack_realign_fp ? hard_frame_pointer_rtx : virtual_stack_vars_rtx;
 }
 
+/* Disables out-of-lined msabi to sysv pro/epilogues and emits a warning if
+   warn_once is null, or *warn_once is zero.  */
+static void disable_call_ms2sysv_xlogues (const char *feature)
+{
+  cfun->machine->call_ms2sysv = false;
+  warning (OPT_mcall_ms2sysv_xlogues, "not currently compatible with %s.",
+  feature);
+}
+
 /* When using -fsplit-stack, the allocation routines set a field in
the TCB to the bottom of the stack plus this much space, measured
in bytes.  */
@@ -12820,9 +12847,50 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
   HOST_WIDE_INT size = get_frame_size ();
   HOST_WIDE_INT to_allocate;
 
+  CLEAR_HARD_REG_SET (stub_managed_regs);
+
+  /* m->call_ms2sysv is initially enabled in ix86_expand_call for all 64-bit
+   * ms_abi functions that call a sysv function.  We now need to prune away
+   * cases where it should be disabled.  */
+  if (TARGET_64BIT && m->call_ms2sysv)
+  {
+gcc_assert (TARGET_64BIT_MS_ABI);
+gcc_assert (TARGET_CALL_MS2SYSV_XLOGUES);
+gcc_assert (!TARGET_SEH);
+
+if (!TARGET_SSE)
+  m->call_ms2sysv = false;
+
+/* Don't break hot-patched functions.  */
+else if (ix86_function_ms_hook_prologue (current_function_decl))
+  m->call_ms2sysv = false;
+
+/* TODO: Cases not yet examined.  */
+else if (crtl->calls_eh_return)
+  disable_call_ms2sysv_xlogues ("__builtin_eh_return");
+
+else if (ix86_static_chain_on_stack)
+  disable_call_ms2sysv_xlogues ("static call chains");
+
+else if (ix86_using_red_zone ())
+  disable_call_ms2sysv_xlogues ("red zones");
+
+else if (flag_split_stack)
+  disable_call_ms2sysv_xlogues ("split stack");
+
+/* Finally, compute which registers the stub will manage.  */
+else
+  {
+   unsigned count = xlogue_layout
+::compute_stub_managed_regs (stub_managed_regs);
+   m->call_ms2sysv_extra_regs = count - xlogue_layout::MIN_REGS;
+  }
+  }
+
   frame->nregs = ix86_nsaved_regs ();
   frame->nsseregs = ix86_nsaved_sseregs ();
-  CLEAR_HARD_REG_SET (stub_managed_regs);
+  m->call_ms2sysv_pad_in = 0;
+  m->call_ms2sysv_pad_out = 0;
 
   /* 64-bit MS ABI seem 

[PATCH 06/12] [i386] Add class xlogue_layout and new fields to struct machine_function

2017-04-27 Thread Daniel Santos
Of the new fields added to struct machine_function, call_ms2sysv is
initially set in ix86_expand_call, but may later be cleared when
ix86_compute_frame_layout is called (both of these are in subsequent
patch).  If it is not cleared, then the remaining new fields will be
set in ix86_compute_frame_layout (also a subsequent patch).

The new class xlogue_layout manages the layout of the stack area used by
the out-of-line save & restore stubs as well as any padding needed
before and after the save area.  It also provides the proper symbol rtx
for the requested stub based upon values of the new fields in struct
machine_function, which specify how many registers are being saved, what
padding is needed, etc.

xlouge_layout cannot be used until stack realign flags are finalized and
ix86_compute_frame_layout is called, at which point
xlouge_layout::get_instance may be used to retrieve the appropriate
(constant) instance of xlouge_layout.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 262 +
 gcc/config/i386/i386.h |  18 
 2 files changed, 280 insertions(+)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 521116195cb..2da3da1f97a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -93,6 +93,7 @@ static rtx legitimize_dllimport_symbol (rtx, bool);
 static rtx legitimize_pe_coff_extern_decl (rtx, bool);
 static rtx legitimize_pe_coff_symbol (rtx, bool);
 static void ix86_print_operand_address_as (FILE *, rtx, addr_space_t, bool);
+static bool ix86_save_reg (unsigned int, bool, bool);
 
 #ifndef CHECK_STACK_LIMIT
 #define CHECK_STACK_LIMIT (-1)
@@ -2432,6 +2433,267 @@ unsigned const 
x86_64_ms_sysv_extra_clobbered_registers[12] =
   XMM12_REG, XMM13_REG, XMM14_REG, XMM15_REG
 };
 
+enum xlogue_stub {
+  XLOGUE_STUB_SAVE,
+  XLOGUE_STUB_RESTORE,
+  XLOGUE_STUB_RESTORE_TAIL,
+  XLOGUE_STUB_SAVE_HFP,
+  XLOGUE_STUB_RESTORE_HFP,
+  XLOGUE_STUB_RESTORE_HFP_TAIL,
+
+  XLOGUE_STUB_COUNT
+};
+
+enum xlogue_stub_sets {
+  XLOGUE_SET_ALIGNED,
+  XLOGUE_SET_ALIGNED_PLUS_8,
+  XLOGUE_SET_HFP_ALIGNED_OR_REALIGN,
+  XLOGUE_SET_HFP_ALIGNED_PLUS_8,
+
+  XLOGUE_SET_COUNT
+};
+
+/* Register save/restore layout used by out-of-line stubs.  */
+class xlogue_layout {
+public:
+  struct reginfo
+  {
+unsigned regno;
+HOST_WIDE_INT offset;  /* Offset used by stub base pointer (rax or
+  rsi) to where each register is stored.  */
+  };
+
+  unsigned get_nregs () const  {return m_nregs;}
+  HOST_WIDE_INT get_stack_align_off_in () const{return 
m_stack_align_off_in;}
+
+  const reginfo &get_reginfo (unsigned reg) const
+  {
+gcc_assert (reg < m_nregs);
+return m_regs[reg];
+  }
+
+  const char *get_stub_name (enum xlogue_stub stub,
+unsigned n_extra_args) const;
+  /* Returns an rtx for the stub's symbol based upon
+   1.) the specified stub (save, restore or restore_ret) and
+   2.) the value of cfun->machine->call_ms2sysv_extra_regs and
+   3.) rather or not stack alignment is being performed.  */
+  rtx get_stub_rtx (enum xlogue_stub stub) const;
+
+  /* Returns the amount of stack space (including padding) that the stub
+ needs to store registers based upon data in the machine_function.  */
+  HOST_WIDE_INT get_stack_space_used () const
+  {
+const struct machine_function &m = *cfun->machine;
+unsigned last_reg = m.call_ms2sysv_extra_regs + MIN_REGS - 1;
+
+gcc_assert (m.call_ms2sysv_extra_regs <= MAX_EXTRA_REGS);
+return m_regs[last_reg].offset
+   + (m.call_ms2sysv_pad_out ? 8 : 0)
+   + STUB_INDEX_OFFSET;
+  }
+
+  /* Returns the offset for the base pointer used by the stub.  */
+  HOST_WIDE_INT get_stub_ptr_offset () const
+  {
+return STUB_INDEX_OFFSET + m_stack_align_off_in;
+  }
+
+  static const struct xlogue_layout &get_instance ();
+  static unsigned compute_stub_managed_regs (HARD_REG_SET &stub_managed_regs);
+
+  static const HOST_WIDE_INT STUB_INDEX_OFFSET = 0x70;
+  static const unsigned MIN_REGS = NUM_X86_64_MS_CLOBBERED_REGS;
+  static const unsigned MAX_REGS = 18;
+  static const unsigned MAX_EXTRA_REGS = MAX_REGS - MIN_REGS;
+  static const unsigned VARIANT_COUNT = MAX_EXTRA_REGS + 1;
+  static const unsigned STUB_NAME_MAX_LEN = 16;
+  static const char * const STUB_BASE_NAMES[XLOGUE_STUB_COUNT];
+  static const unsigned REG_ORDER[MAX_REGS];
+  static const unsigned REG_ORDER_REALIGN[MAX_REGS];
+
+private:
+  xlogue_layout ();
+  xlogue_layout (HOST_WIDE_INT stack_align_off_in, bool hfp);
+  xlogue_layout (const xlogue_layout &);
+
+  /* True if hard frame pointer is used.  */
+  bool m_hfp;
+
+  /* Max number of register this layout manages.  */
+  unsigned m_nregs;
+
+  /* Incoming offset from 16-byte alignment.  */
+  HOST_WIDE_INT m_stack_align_off_in;
+
+  /* Register order and offsets.  */
+  struct reginfo m_regs[MAX_REGS

[PATCH 07/12] [i386] Modify ix86_save_reg to optionally omit stub-managed registers

2017-04-27 Thread Daniel Santos
Add HARD_REG_SET stub_managed_regs to track which registers will be
managed by the pro/epilogue stubs for the function.

Add a third parameter bool ignore_outlined to ix86_save_reg to specify
rather or not the count should include registers marked in
stub_managed_regs.  All call sites are modified.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 31 ---
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2da3da1f97a..4f0cb7dd6cc 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12618,6 +12618,10 @@ ix86_hard_regno_scratch_ok (unsigned int regno)
  && df_regs_ever_live_p (regno)));
 }
 
+/* Registers who's save & restore will be managed by stubs called from
+   pro/epilogue.  */
+static HARD_REG_SET GTY(()) stub_managed_regs;
+
 /* Return true if register class CL should be an additional allocno
class.  */
 
@@ -12630,7 +12634,7 @@ ix86_additional_allocno_class_p (reg_class_t cl)
 /* Return TRUE if we need to save REGNO.  */
 
 static bool
-ix86_save_reg (unsigned int regno, bool maybe_eh_return)
+ix86_save_reg (unsigned int regno, bool maybe_eh_return, bool ignore_outlined)
 {
   /* If there are no caller-saved registers, we preserve all registers,
  except for MMX and x87 registers which aren't supported when saving
@@ -12698,6 +12702,10 @@ ix86_save_reg (unsigned int regno, bool 
maybe_eh_return)
}
 }
 
+  if (ignore_outlined && cfun->machine->call_ms2sysv
+  && in_hard_reg_set_p (stub_managed_regs, DImode, regno))
+return false;
+
   if (crtl->drap_reg
   && regno == REGNO (crtl->drap_reg)
   && !cfun->machine->no_drap_save_restore)
@@ -12718,7 +12726,7 @@ ix86_nsaved_regs (void)
   int regno;
 
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true))
+if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true, true))
   nregs ++;
   return nregs;
 }
@@ -12734,7 +12742,7 @@ ix86_nsaved_sseregs (void)
   if (!TARGET_64BIT_MS_ABI)
 return 0;
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-if (SSE_REGNO_P (regno) && ix86_save_reg (regno, true))
+if (SSE_REGNO_P (regno) && ix86_save_reg (regno, true, true))
   nregs ++;
   return nregs;
 }
@@ -12814,6 +12822,7 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
 
   frame->nregs = ix86_nsaved_regs ();
   frame->nsseregs = ix86_nsaved_sseregs ();
+  CLEAR_HARD_REG_SET (stub_managed_regs);
 
   /* 64-bit MS ABI seem to require stack alignment to be always 16,
  except for function prologues, leaf functions and when the defult
@@ -13207,7 +13216,7 @@ ix86_emit_save_regs (void)
   rtx_insn *insn;
 
   for (regno = FIRST_PSEUDO_REGISTER - 1; regno-- > 0; )
-if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true))
+if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true, true))
   {
insn = emit_insn (gen_push (gen_rtx_REG (word_mode, regno)));
RTX_FRAME_RELATED_P (insn) = 1;
@@ -13297,7 +13306,7 @@ ix86_emit_save_regs_using_mov (HOST_WIDE_INT cfa_offset)
   unsigned int regno;
 
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true))
+if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true, true))
   {
 ix86_emit_save_reg_using_mov (word_mode, regno, cfa_offset);
cfa_offset -= UNITS_PER_WORD;
@@ -13312,7 +13321,7 @@ ix86_emit_save_sse_regs_using_mov (HOST_WIDE_INT 
cfa_offset)
   unsigned int regno;
 
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-if (SSE_REGNO_P (regno) && ix86_save_reg (regno, true))
+if (SSE_REGNO_P (regno) && ix86_save_reg (regno, true, true))
   {
ix86_emit_save_reg_using_mov (V4SFmode, regno, cfa_offset);
cfa_offset -= GET_MODE_SIZE (V4SFmode);
@@ -13696,13 +13705,13 @@ get_scratch_register_on_entry (struct scratch_reg *sr)
   && !static_chain_p
   && drap_regno != CX_REG)
regno = CX_REG;
-  else if (ix86_save_reg (BX_REG, true))
+  else if (ix86_save_reg (BX_REG, true, false))
regno = BX_REG;
   /* esi is the static chain register.  */
   else if (!(regparm == 3 && static_chain_p)
-  && ix86_save_reg (SI_REG, true))
+  && ix86_save_reg (SI_REG, true, false))
regno = SI_REG;
-  else if (ix86_save_reg (DI_REG, true))
+  else if (ix86_save_reg (DI_REG, true, false))
regno = DI_REG;
   else
{
@@ -14812,7 +14821,7 @@ ix86_emit_restore_regs_using_mov (HOST_WIDE_INT 
cfa_offset,
   unsigned int regno;
 
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-if (GENE

[PATCH 11/12] [i386] Add remainder of -mcall-ms2sysv-xlogues implementation

2017-04-27 Thread Daniel Santos
Add functions emit_outlined_ms2sysv_save and
emit_outlined_ms2sysv_restore, which are called from
ix86_expand_prologue and ix86_expand_epilogue, respectively.  Also adds
the code to ix86_expand_call that enables the optimization (setting the
machine_function's outline_ms_sysv field).

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 281 +++--
 1 file changed, 272 insertions(+), 9 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index debfe457d97..6a4e6f8e728 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -14271,6 +14271,79 @@ ix86_elim_entry_set_got (rtx reg)
 }
 }
 
+static rtx
+gen_frame_set (rtx reg, rtx frame_reg, int offset, bool store)
+{
+  rtx addr, mem;
+
+  if (offset)
+addr = gen_rtx_PLUS (Pmode, frame_reg, GEN_INT (offset));
+  mem = gen_frame_mem (GET_MODE (reg), offset ? addr : frame_reg);
+  return gen_rtx_SET (store ? mem : reg, store ? reg : mem);
+}
+
+static inline rtx
+gen_frame_load (rtx reg, rtx frame_reg, int offset)
+{
+  return gen_frame_set (reg, frame_reg, offset, false);
+}
+
+static inline rtx
+gen_frame_store (rtx reg, rtx frame_reg, int offset)
+{
+  return gen_frame_set (reg, frame_reg, offset, true);
+}
+
+static void
+ix86_emit_outlined_ms2sysv_save (const struct ix86_frame &frame)
+{
+  struct machine_function *m = cfun->machine;
+  const unsigned ncregs = NUM_X86_64_MS_CLOBBERED_REGS
+ + m->call_ms2sysv_extra_regs;
+  rtvec v = rtvec_alloc (ncregs - 1 + 3);
+  unsigned int align, i, vi = 0;
+  rtx_insn *insn;
+  rtx sym, addr;
+  rtx rax = gen_rtx_REG (word_mode, AX_REG);
+  const struct xlogue_layout &xlogue = xlogue_layout::get_instance ();
+  HOST_WIDE_INT rax_offset = xlogue.get_stub_ptr_offset () + m->fs.sp_offset;
+  HOST_WIDE_INT stack_alloc_size = frame.stack_pointer_offset - 
m->fs.sp_offset;
+  HOST_WIDE_INT stack_align_off_in = xlogue.get_stack_align_off_in ();
+
+  /* Verify that the incoming stack 16-byte alignment offset matches the
+ layout we're using.  */
+  gcc_assert (stack_align_off_in == (m->fs.sp_offset & UNITS_PER_WORD));
+
+  /* Get the stub symbol.  */
+  sym = xlogue.get_stub_rtx (frame_pointer_needed ? XLOGUE_STUB_SAVE_HFP
+ : XLOGUE_STUB_SAVE);
+  RTVEC_ELT (v, vi++) = gen_rtx_USE (VOIDmode, sym);
+  RTVEC_ELT (v, vi++) = const0_rtx;
+
+  /* Setup RAX as the stub's base pointer.  */
+  align = GET_MODE_ALIGNMENT (V4SFmode);
+  addr = choose_baseaddr (rax_offset, &align);
+  gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode));
+  insn = emit_insn (gen_rtx_SET (rax, addr));
+
+  gcc_assert (stack_alloc_size >= xlogue.get_stack_space_used ());
+  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
+GEN_INT (-stack_alloc_size), -1,
+m->fs.cfa_reg == stack_pointer_rtx);
+  for (i = 0; i < ncregs; ++i)
+{
+  const xlogue_layout::reginfo &r = xlogue.get_reginfo (i);
+  rtx reg = gen_rtx_REG ((SSE_REGNO_P (r.regno) ? V4SFmode : word_mode),
+r.regno);
+  RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);;
+}
+
+  gcc_assert (vi == (unsigned)GET_NUM_ELEM (v));
+
+  insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, v));
+  RTX_FRAME_RELATED_P (insn) = true;
+}
+
 /* Expand the prologue into a bunch of separate insns.  */
 
 void
@@ -14518,7 +14591,7 @@ ix86_expand_prologue (void)
 performing the actual alignment.  Otherwise we cannot guarantee
 that there's enough storage above the realignment point.  */
   allocate = frame.stack_realign_allocate_offset - m->fs.sp_offset;
-  if (allocate)
+  if (allocate && !m->call_ms2sysv)
 pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
   GEN_INT (-allocate), -1, false);
 
@@ -14526,7 +14599,6 @@ ix86_expand_prologue (void)
   insn = emit_insn (ix86_gen_andsp (stack_pointer_rtx,
stack_pointer_rtx,
GEN_INT (-align_bytes)));
-
   /* For the purposes of register save area addressing, the stack
 pointer can no longer be used to access anything in the frame
 below m->fs.sp_realigned_offset and the frame pointer cannot be
@@ -14543,6 +14615,9 @@ ix86_expand_prologue (void)
m->fs.sp_valid = false;
 }
 
+  if (m->call_ms2sysv)
+ix86_emit_outlined_ms2sysv_save (frame);
+
   allocate = frame.stack_pointer_offset - m->fs.sp_offset;
 
   if (flag_stack_usage_info)
@@ -14863,17 +14938,19 @@ ix86_emit_restore_regs_using_pop (void)
   unsigned int regno;
 
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, false))
+if (GENERAL_REGNO_P (regno) &

  1   2   >