[PATCH try 2 resend] [i386] Remove warnings for ignoring -mcall-ms2sysv-xlogues.
I appear to have forgotten to cc gcc-patches, sorry about that. There are currently three cases where we issue a warning when disabling -mcall-ms2sysv-xlogues for a function, but I never added a proper warning, so there's no mechanism for disabling it. This is something that I meant to address sooner. I'm thinking that it's better to just remove the warning entirely and document these cases, rather than adding a new warning. Any thoughts? These are the conditions: * the use of -fsplit-stack, * the use of static call chains (not sure if we can ever have that), and * if the function calls __buildin_eh_return. Some of these cases can likely be supported, but they are just on the "not yet tested" list. 2017-06-11 Daniel Santos --- gcc/config/i386/i386.c | 26 +++--- gcc/doc/invoke.texi| 25 - 2 files changed, 23 insertions(+), 28 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index d5c2d46bf5e..2dc6e53c765 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12772,18 +12772,6 @@ ix86_builtin_setjmp_frame_value (void) return stack_realign_fp ? hard_frame_pointer_rtx : virtual_stack_vars_rtx; } -/* Emits a warning for unsupported msabi to sysv pro/epilogues. */ -static void warn_once_call_ms2sysv_xlogues (const char *feature) -{ - static bool warned_once = false; - if (!warned_once) -{ - warning (0, "-mcall-ms2sysv-xlogues is not compatible with %s", - feature); - warned_once = true; -} -} - /* When using -fsplit-stack, the allocation routines set a field in the TCB to the bottom of the stack plus this much space, measured in bytes. */ @@ -12814,18 +12802,10 @@ ix86_compute_frame_layout (void) gcc_assert (TARGET_SSE); gcc_assert (!ix86_using_red_zone ()); - if (crtl->calls_eh_return) + if (crtl->calls_eh_return || ix86_static_chain_on_stack) { gcc_assert (!reload_completed); m->call_ms2sysv = false; - warn_once_call_ms2sysv_xlogues ("__builtin_eh_return"); - } - - else if (ix86_static_chain_on_stack) - { - gcc_assert (!reload_completed); - m->call_ms2sysv = false; - warn_once_call_ms2sysv_xlogues ("static call chains"); } /* Finally, compute which registers the stub will manage. */ @@ -29290,9 +29270,9 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1, else if (ix86_function_ms_hook_prologue (current_function_decl)) ; - /* TODO: Cases not yet examined. */ + /* TODO: Compatibility not yet examined. */ else if (flag_split_stack) - warn_once_call_ms2sysv_xlogues ("-fsplit-stack"); + ; else { diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index c1168823af7..eec02b43a4f 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -25389,11 +25389,26 @@ using the function attributes @code{ms_abi} and @code{sysv_abi}. @opindex mno-call-ms2sysv-xlogues Due to differences in 64-bit ABIs, any Microsoft ABI function that calls a System V ABI function must consider RSI, RDI and XMM6-15 as clobbered. By -default, the code for saving and restoring these registers is emitted inline, -resulting in fairly lengthy prologues and epilogues. Using -@option{-mcall-ms2sysv-xlogues} emits prologues and epilogues that -use stubs in the static portion of libgcc to perform these saves and restores, -thus reducing function size at the cost of a few extra instructions. +default, the instructions for saving and restoring these registers are emitted +inline, resulting in fairly lengthy pro- and epilogues. Using +@option{-mcall-ms2sysv-xlogues} emits pro- and epilogues that use stubs in the +static portion of libgcc to perform these saves and restores, thus reducing +function size at the cost of executing a few extra instructions. This cost is +theoretically mitigated or eliminated by reduced instruction cache utilization, +temporal locality of the stubs, and the stubs' use of MOV instructions over +PUSH and POP. + +This option is not supported with SEH, so it is completely unavailable on +Windows. It is also silently disabled if a function: + +@enumerate +@item is built with @option{-mno-sse2} or @option{-fsplit-stack}, +@item has @code{__attribute__ ((ms_hook_prologue))}, or +@item either throws an exception or explicitly calls @code{__builtin_eh_return}. +@end enumerate + +Support for @option{-fsplit-stack} and @code{__builtin_eh_return} may be +added at some time in the future, but has not yet been tested. @item -mtls-dialect=@var{type} @opindex mtls-dialect -- 2.11.0
Re: [PATCH v2 0/2] [testsuite, libgcc] PR80759 Fix FAILs on Solaris and Darwin
This patchset addresses a number of testsuite issues for gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp, mostly occurring on Solaris and Darwin. Additionally, it solves a bug in libgcc that caused link failures on Darwin when building with -mcall-ms2sysv-xlogues. The issues are detailed in the notes for each patch. I would particularly appreciate any feedback for Darwin as I am unfamiliar with the platform and Rainer and I have fashioned some of these changes by looking at other Darwin code in gcc. .../gcc.target/x86_64/abi/ms-sysv/do-test.S | 200 --- .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.c | 83 +++- .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp| 153 +- libgcc/config.host | 6 +- libgcc/config/i386/i386-asm.h| 89 + libgcc/config/i386/resms64.S | 2 +- libgcc/config/i386/resms64f.S| 2 +- libgcc/config/i386/resms64fx.S | 2 +- libgcc/config/i386/resms64x.S| 2 +- libgcc/config/i386/savms64.S | 2 +- libgcc/config/i386/savms64f.S| 2 +- 11 files changed, 274 insertions(+), 269 deletions(-) Many thanks to Rainer for all of his help on this! Thanks, Daniel 2017-06-28 Daniel Santos 2017-06-10 Daniel Santos PR testsuite/80759 * gcc.target/x86_64/abi/ms-sysv/do-test.S (ELFFN_BEGIN): Rename to FN_TYPE. (ELFFN_END): Rename to FN_SIZE. (ASMNAME): New macro. (FUNC): Rename to FUNC_BEGIN, use ASMNAME and use .globl instead of .global. (FUNC_END): Use ASMNAME. (test_data_save): Remove. (test_data_input): Likewise. (test_data_output: Likewise. (test_data_fn): Likewise. (test_data_retaddr): Likewise. (regs_to_mem): Make globals, use r10 instead of rax. (mem_to_regs): Likewise. (do_test_unaligned): Remove .cfi directives, remove pushf/popf, move body to ms-sysv.c. (do_test_aligned): Likewise. * gcc.target/x86_64/abi/ms-sysv/ms-sysv.c: Add dg-* directives. (PASTE_STR): New macro. (ASMNAME): Likewise. (LOAD_TEST_DATA_ADDR): Likewise. (TEST_DATA_OFFSET): Likewise. (do_test_body0): New C function. (do_test_body): New inline assembly routine. * gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp (runtest_ms_sysv): Modify. 2017-06-28 Daniel Santos PR testsuite/80759 * config.host: include i386/t-msabi for darwin and solaris. * config/i386/i386-asm.h (ELFFN): Rename to FN_TYPE. (FN_SIZE): New macro. (FN_HIDDEN): Likewise. (ASMNAME): Likewise. (FUNC_START): Rename to FUNC_BEGIN, use ASMNAME, replace .global with .globl. (HIDDEN_FUNC): Use ASMNAME and .globl instead of .global. (SSE_SAVE): Convert to cpp macro, hard-code offset (always 0x60). * config/i386/resms64.S: Use SSE_SAVE as cpp macro instead of gas .macro. * config/i386/resms64f.S: Likewise. * config/i386/resms64fx.S: Likewise. * config/i386/resms64x.S: Likewise. * config/i386/savms64.S: Likewise. * config/i386/savms64f.S: Likewise.
[PATCH 1/2] [testsuite] PR80759 fix tests on Solaris and Darwin
The ms-sysv.exp tests were failing on Solaris and Darwin targets. In addition, a number of other problems have been identified. * Assembly failed on Solaris and Darwin when not using gas due to use of .cfi directives and .struct. * Tests were failing on Solaris due to hard frame pointer being always enabled on that platform and and not passing --omit-rbp-clobbers to the code generator. * Manual compilation (via remote_exec as opposed to dg-runtest, et. al.) was missing TEST_ALWAYS_FLAGS, resulting in color codes in log files. It was also missing -m64 in some cases where it was needed. * When built with make -j48 on an unsupported triplet, the "test unsupported" message appeared 48 times in the log (it appears that several other tests do this as well). * Using hard-coded offsets in do-tests.S is ugly. This is fixed by moving some code into inline assembly in ms-sysv.c. * Custom parallelization code broke when running make without -j * Accessing the test_data global from assembly requires(?) use of global offset table on Darwin. This patch corrects all of these problems. The custom parallelization code has been removed and replaced with calls to procs in gcc's standard testing framework: gcc_parallel_test_enable, runtest_file_p and dg-runtest. This results in much poorer parallelization, which I hope to address in a future patch, but has little effect when built without checking enabled. Previously, each test job compiled and executed around 20k individual tests. This high number resulted in test jobs far exceeding the default 5 minute timeout for remote_/local_exec when gcc was built with --enable-checking=rtl. This has been resolved by splitting the tests out to a maximum of around 3500 tests per job. Signed-off-by: Daniel Santos --- .../gcc.target/x86_64/abi/ms-sysv/do-test.S| 200 + .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.c| 83 - .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp | 153 +--- 3 files changed, 210 insertions(+), 226 deletions(-) diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S index 1395235fd1e..ffe011bcc68 100644 --- a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S +++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S @@ -23,141 +23,101 @@ a copy of the GCC Runtime Library Exception along with this program; see the files COPYING3 and COPYING.RUNTIME respectively. If not, see <http://www.gnu.org/licenses/>. */ -#ifdef __x86_64__ - -# ifdef __ELF__ -# define ELFFN_BEGIN(fn) .type fn,@function -# define ELFFN_END(fn) .size fn,.-fn -# else -# define ELFFN_BEGIN(fn) -# define ELFFN_END(fn) -# endif - -# define FUNC(fn) \ - .global fn; \ - ELFFN_BEGIN(fn);\ -fn: - -#define FUNC_END(fn) ELFFN_END(fn) - -# ifdef __AVX__ -# define MOVAPS vmovaps -# else -# define MOVAPS movaps -# endif - -/* TODO: Is there a cleaner way to provide these offsets? */ - .struct 0 -test_data_save: - - .struct test_data_save + 224 -test_data_input: - - .struct test_data_save + 448 -test_data_output: - - .struct test_data_save + 672 -test_data_fn: - - .struct test_data_save + 680 -test_data_retaddr: +#if defined(__x86_64__) && defined(__SSE2__) + +/* These macros currently support GNU/Linux, Solaris and Darwin. */ + +#ifdef __ELF__ +# define FN_TYPE(fn) .type fn,@function +# define FN_SIZE(fn) .size fn,.-fn +#else +# define FN_TYPE(fn) +# define FN_SIZE(fn) +#endif + +#ifdef __USER_LABEL_PREFIX__ +# define ASMNAME2(prefix, name)prefix ## name +# define ASMNAME1(prefix, name)ASMNAME2(prefix, name) +# define ASMNAME(name) ASMNAME1(__USER_LABEL_PREFIX__, name) +#else +# define ASMNAME(name) name +#endif + +#define FUNC_BEGIN(fn) \ + .globl ASMNAME(fn); \ + FN_TYPE (ASMNAME(fn)); \ +ASMNAME(fn): + +#define FUNC_END(fn) FN_SIZE(ASMNAME(fn)) + +#ifdef __AVX__ +# define MOVAPS vmovaps +#else +# define MOVAPS movaps +#endif .text -regs_to_mem: - MOVAPS %xmm6, (%rax) - MOVAPS %xmm7, 0x10(%rax) - MOVAPS %xmm8, 0x20(%rax) - MOVAPS %xmm9, 0x30(%rax) - MOVAPS %xmm10, 0x40(%rax) - MOVAPS %xmm11, 0x50(%rax) - MOVAPS %xmm12, 0x60(%rax) - MOVAPS %xmm13, 0x70(%rax) - MOVAPS %xmm14, 0x80(%rax) - MOVAPS %xmm15, 0x90(%rax) - mov %rsi, 0xa0(%rax) - mov %rdi, 0xa8(%rax) - mov %rbx, 0xb0(%rax) - mov %rbp, 0xb8(%rax) - mov %r12, 0xc0(%rax) - mov %r13, 0xc8(%rax) - mov %r14, 0xd0(%rax) - mov %r15, 0xd8(%rax) +FUNC_BEGIN(regs_to_mem) + MOVAPS %xmm6, (%r10) + MOVAPS %xmm7, 0x10(%r10) + MOVAPS %xmm8, 0x20(%r10) + MOVAPS %xmm9, 0x30(%r10) + MOVAPS %xmm10, 0x40(%r10) + MOVAPS %xmm
[PATCH 2/2] [libgcc]: PR80759 fixes for Solaris & Darwin
The -mcall-ms2sysv-xlogues option is supposed to work on Solaris and Darwin, but my changes to config.host weren't adding the sav/res stubs to libgcc and the assembly code wasn't compatible with their assemblers either. * Change config.host to build -mcall-ms2sysv-xlogues sav/res stubs on Solaris and Darwin. * Replace .macro/.endm with cpp macros * Replace .global with .globl * Append __USER_LABEL_PREFIX__ when defined (via ASMNAME macro). * Only use .size when __ELF__ is defined. * Only use .hidden when both __ELF__ and HAVE_GAS_HIDDEN are defined. Signed-off-by: Daniel Santos --- libgcc/config.host | 6 +-- libgcc/config/i386/i386-asm.h | 89 ++ libgcc/config/i386/resms64.S | 2 +- libgcc/config/i386/resms64f.S | 2 +- libgcc/config/i386/resms64fx.S | 2 +- libgcc/config/i386/resms64x.S | 2 +- libgcc/config/i386/savms64.S | 2 +- libgcc/config/i386/savms64f.S | 2 +- 8 files changed, 64 insertions(+), 43 deletions(-) diff --git a/libgcc/config.host b/libgcc/config.host index cf62e0e54f7..bee3e931106 100644 --- a/libgcc/config.host +++ b/libgcc/config.host @@ -588,12 +588,12 @@ hppa*-*-openbsd*) tmake_file="$tmake_file pa/t-openbsd" ;; i[34567]86-*-darwin*) - tmake_file="$tmake_file i386/t-crtpc t-crtfm" + tmake_file="$tmake_file i386/t-crtpc t-crtfm i386/t-msabi" tm_file="$tm_file i386/darwin-lib.h" extra_parts="$extra_parts crtprec32.o crtprec64.o crtprec80.o crtfastmath.o" ;; x86_64-*-darwin*) - tmake_file="$tmake_file i386/t-crtpc t-crtfm" + tmake_file="$tmake_file i386/t-crtpc t-crtfm i386/t-msabi" tm_file="$tm_file i386/darwin-lib.h" extra_parts="$extra_parts crtprec32.o crtprec64.o crtprec80.o crtfastmath.o" ;; @@ -670,7 +670,7 @@ i[34567]86-*-rtems*) extra_parts="$extra_parts crti.o crtn.o" ;; i[34567]86-*-solaris2* | x86_64-*-solaris2.1[0-9]*) - tmake_file="$tmake_file i386/t-crtpc t-crtfm" + tmake_file="$tmake_file i386/t-crtpc t-crtfm i386/t-msabi" extra_parts="$extra_parts crtprec32.o crtprec64.o crtprec80.o crtfastmath.o" tm_file="${tm_file} i386/elf-lib.h" md_unwind_header=i386/sol2-unwind.h diff --git a/libgcc/config/i386/i386-asm.h b/libgcc/config/i386/i386-asm.h index c613e9fd83d..1387fd24b4f 100644 --- a/libgcc/config/i386/i386-asm.h +++ b/libgcc/config/i386/i386-asm.h @@ -26,22 +26,45 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #ifndef I386_ASM_H #define I386_ASM_H +#include "auto-host.h" + +/* These macros currently support GNU/Linux, Solaris and Darwin. */ + #ifdef __ELF__ -# define ELFFN(fn) .type fn,@function +# define FN_TYPE(fn) .type fn,@function +# define FN_SIZE(fn) .size fn,.-fn +# ifdef HAVE_GAS_HIDDEN +# define FN_HIDDEN(fn) .hidden fn +# endif +#else +# define FN_TYPE(fn) +# define FN_SIZE(fn) +#endif + +#ifndef FN_HIDDEN +# define FN_HIDDEN(fn) +#endif + +#ifdef __USER_LABEL_PREFIX__ +# define ASMNAME2(prefix, name)prefix ## name +# define ASMNAME1(prefix, name)ASMNAME2(prefix, name) +# define ASMNAME(name) ASMNAME1(__USER_LABEL_PREFIX__, name) #else -# define ELFFN(fn) +# define ASMNAME(name) name #endif -#define FUNC_START(fn) \ - .global fn; \ - ELFFN (fn); \ -fn: +#define FUNC_BEGIN(fn) \ + .globl ASMNAME(fn); \ + FN_TYPE (ASMNAME(fn)); \ +ASMNAME(fn): -#define HIDDEN_FUNC(fn)\ - FUNC_START (fn) \ - .hidden fn; \ +#define HIDDEN_FUNC(fn)\ + .globl ASMNAME(fn); \ + FN_TYPE(ASMNAME(fn)); \ + FN_HIDDEN(ASMNAME(fn)); \ +ASMNAME(fn): -#define FUNC_END(fn) .size fn,.-fn +#define FUNC_END(fn) FN_SIZE(ASMNAME(fn)) #ifdef __SSE2__ # ifdef __AVX__ @@ -51,32 +74,30 @@ fn: # endif /* Save SSE registers 6-15. off is the offset of rax to get to xmm6. */ -.macro SSE_SAVE off=0 - MOVAPS %xmm15,(\off - 0x90)(%rax) - MOVAPS %xmm14,(\off - 0x80)(%rax) - MOVAPS %xmm13,(\off - 0x70)(%rax) - MOVAPS %xmm12,(\off - 0x60)(%rax) - MOVAPS %xmm11,(\off - 0x50)(%rax) - MOVAPS %xmm10,(\off - 0x40)(%rax) - MOVAPS %xmm9, (\off - 0x30)(%rax) - MOVAPS %xmm8, (\off - 0x20)(%rax) - MOVAPS %xmm7, (\off - 0x10)(%rax) - MOVAPS %xmm6, \off(%rax) -.endm +#define SSE_SAVE \ + MOVAPS %xmm15,-0x30(%rax); \ + MOVAPS %xmm14,-0x20(%rax); \ + MOVAPS %xmm13,-0x10(%rax); \ + MOVAPS %xmm12, (%rax); \ + MOVAPS %xmm11, 0x10(%rax); \ + MOVAPS %xmm10, 0x20(%rax); \ + MOVAPS %xmm9, 0x30(%rax); \ + MOVAPS %xmm8, 0x40(%rax); \ + MOVAPS %xmm7, 0x50(%rax); \ + MOVAPS %xmm6, 0x60(%rax) /* Restore SSE registers 6-
Re: [PATCH] Fix ms-sysv.exp testsuite FAILs (PR c/83117)
On 11/27/2017 04:34 PM, Jakub Jelinek wrote: > Hi! > > As mentioned in the PR, my C FE rvalue folding patch allows folding > const variable initializers into the uses of those variables in rvalue > contexts more than before, and so we get warnings about UB in the test, > because an unprototyped function is cast to a function type with ellipsis in > it. > > It isn't entirely clear what exactly the test wants to test, as mentioned > in the PR, this is one of the options how to solve it, by dropping the > const it can't be optimized in the FEs (the optimizers can still figure out > the static vars are never written to). Another option would be just > add -w to dg-options, another one is const volatile. > > Regtested on x86_64-linux and i686-linux, ok for trunk? > > 2017-11-27 Jakub Jelinek > > PR c/83117 > * gcc.target/x86_64/abi/ms-sysv/gen.cc (make_do_tests_decl): Drop > const from do_test_{u,v}*. > > --- gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc.jj 2017-05-22 > 10:49:45.0 +0200 > +++ gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc2017-11-27 > 11:57:14.889570915 +0100 > @@ -392,7 +392,7 @@ static void make_do_tests_decl (const ve > continue; > > comma.reset (); > - out << "static __attribute__ ((ms_abi)) long (*const do_test_" > + out << "static __attribute__ ((ms_abi)) long (*do_test_" > << (unaligned ? "u" : "") > << (varargs ? "v" : "") << i << ") ("; > > > Jakub > I don't have a problem with removing const, it's only there for const-correctness and caution. I just posted to the PR a bit ago and I'm curious if there is a better approach when using assembly stubs that are meant to be called in varying ways. CV would work also, although there's no real need to refetch the address before each use. If you don't have a better way to do this then please use this patch. Thanks! Daniel
Re: [PATCH] Fix ms-sysv.exp testsuite FAILs (PR c/83117)
On 11/28/2017 05:22 AM, Jakub Jelinek wrote: > On Mon, Nov 27, 2017 at 05:02:32PM -0600, Daniel Santos wrote: >>> --- gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc.jj 2017-05-22 >>> 10:49:45.0 +0200 >>> +++ gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc 2017-11-27 >>> 11:57:14.889570915 +0100 >>> @@ -392,7 +392,7 @@ static void make_do_tests_decl (const ve >>> continue; >>> >>> comma.reset (); >>> - out << "static __attribute__ ((ms_abi)) long (*const do_test_" >>> + out << "static __attribute__ ((ms_abi)) long (*do_test_" >>> << (unaligned ? "u" : "") >>> << (varargs ? "v" : "") << i << ") ("; >> I don't have a problem with removing const, it's only there for >> const-correctness and caution. I just posted to the PR a bit ago and >> I'm curious if there is a better approach when using assembly stubs that >> are meant to be called in varying ways. CV would work also, although >> there's no real need to refetch the address before each use. >> >> If you don't have a better way to do this then please use this patch. > I've verified the resulting *.optimized dump as well as assembly is > practically identical without/with the patch, only differences are in > SSA_NAME versions, in assembly the .LC and .LCFI constants are > different but otherwise it is the same - the functions are emitted in > different orders by cgraph and committed the patch. > > Using assembly stubs that are meant to be called in varying ways should > just be avoided in portable programs, you could e.g. in the generator > instead of all those: > extern __attribute__ ((ms_abi)) long do_test_aligned (); > extern __attribute__ ((ms_abi)) long do_test_unaligned (); > static __attribute__ ((ms_abi)) long (*do_test_1) (long a) = > (void*)do_test_aligned; > static __attribute__ ((ms_abi)) long (*do_test_v1) (long a, ...) = > (void*)do_test_aligned; > static __attribute__ ((ms_abi)) long (*do_test_u1) (long a) = > (void*)do_test_unaligned; > static __attribute__ ((ms_abi)) long (*do_test_uv1) (long a, ...) = > (void*)do_test_unaligned; > emit: > extern __attribute__ ((ms_abi)) long do_test_1 (long a); > asm (".text; do_test_1: jmp do_test_aligned; .previous"); > extern __attribute__ ((ms_abi)) long do_test_v1 (long a, ...); > asm (".text; do_test_v1: jmp do_test_aligned; .previous"); > extern __attribute__ ((ms_abi)) long do_test_1 (long a); > asm (".text; do_test_u1: jmp do_test_unaligned; .previous"); > extern __attribute__ ((ms_abi)) long do_test_1 (long a, ...); > asm (".text; do_test_uv1: jmp do_test_unaligned; .previous"); > or something similar. > > Jakub Ah hah! That would indeed work. Thanks for the tip. I have some improvements to make to this set of tests, mostly tests triggered by GCC_TEST_RUN_EXPENSIVE, but perhaps I can make this modification as well. Come to think of it, attribute naked might work too. Thanks, Daniel
[PATCH, x86, libgcc] PR target/83917 Correct debug for -mcall-ms2sysv-xlogues stubs
When stepping through tail-call restore stubs the debugger has to assume that rsp - 8 is the CFA, although it is not. This is because I did not explicitly add any .cfi directives. This patch adds them to the tail-call restore stubs, but this is new territory for me, so I would appreciate feedback. I've reg-tested on x86_64, but I still need to test on Solaris and Darwin. OK to commit after those tests? Thanks, Daniel Signed-off-by: Daniel Santos --- libgcc/config/i386/resms64fx.h | 19 +++ libgcc/config/i386/resms64x.h | 22 ++ 2 files changed, 41 insertions(+) diff --git a/libgcc/config/i386/resms64fx.h b/libgcc/config/i386/resms64fx.h index c5f63d879fe..7dc8c7d89ed 100644 --- a/libgcc/config/i386/resms64fx.h +++ b/libgcc/config/i386/resms64fx.h @@ -34,21 +34,40 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see .text MS2SYSV_STUB_BEGIN(resms64fx_17) +.cfi_startproc +.cfi_def_cfa %rbp, 16 mov -0x68(%rsi),%r15 +.cfi_endproc MS2SYSV_STUB_BEGIN(resms64fx_16) +.cfi_startproc +.cfi_def_cfa %rbp, 16 mov -0x60(%rsi),%r14 +.cfi_endproc MS2SYSV_STUB_BEGIN(resms64fx_15) +.cfi_startproc +.cfi_def_cfa %rbp, 16 mov -0x58(%rsi),%r13 +.cfi_endproc MS2SYSV_STUB_BEGIN(resms64fx_14) +.cfi_startproc +.cfi_def_cfa %rbp, 16 mov -0x50(%rsi),%r12 +.cfi_endproc MS2SYSV_STUB_BEGIN(resms64fx_13) +.cfi_startproc +.cfi_def_cfa %rbp, 16 mov -0x48(%rsi),%rbx +.cfi_endproc MS2SYSV_STUB_BEGIN(resms64fx_12) +.cfi_startproc +.cfi_def_cfa %rbp, 16 mov -0x40(%rsi),%rdi SSE_RESTORE mov -0x38(%rsi),%rsi leaveq +.cfi_def_cfa %rsp, 8 ret +.cfi_endproc MS2SYSV_STUB_END(resms64fx_12) MS2SYSV_STUB_END(resms64fx_13) MS2SYSV_STUB_END(resms64fx_14) diff --git a/libgcc/config/i386/resms64x.h b/libgcc/config/i386/resms64x.h index 1b44938ae7c..753be1f4c52 100644 --- a/libgcc/config/i386/resms64x.h +++ b/libgcc/config/i386/resms64x.h @@ -33,23 +33,45 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see .text MS2SYSV_STUB_BEGIN(resms64x_18) +.cfi_startproc +.cfi_def_cfa %r10, 8 mov -0x70(%rsi),%r15 +.cfi_endproc MS2SYSV_STUB_BEGIN(resms64x_17) +.cfi_startproc +.cfi_def_cfa %r10, 8 mov -0x68(%rsi),%r14 +.cfi_endproc MS2SYSV_STUB_BEGIN(resms64x_16) +.cfi_startproc +.cfi_def_cfa %r10, 8 mov -0x60(%rsi),%r13 +.cfi_endproc MS2SYSV_STUB_BEGIN(resms64x_15) +.cfi_startproc +.cfi_def_cfa %r10, 8 mov -0x58(%rsi),%r12 +.cfi_endproc MS2SYSV_STUB_BEGIN(resms64x_14) +.cfi_startproc +.cfi_def_cfa %r10, 8 mov -0x50(%rsi),%rbp +.cfi_endproc MS2SYSV_STUB_BEGIN(resms64x_13) +.cfi_startproc +.cfi_def_cfa %r10, 8 mov -0x48(%rsi),%rbx +.cfi_endproc MS2SYSV_STUB_BEGIN(resms64x_12) +.cfi_startproc +.cfi_def_cfa %r10, 8 mov -0x40(%rsi),%rdi SSE_RESTORE mov -0x38(%rsi),%rsi mov %r10,%rsp +.cfi_def_cfa_register %rsp ret +.cfi_endproc MS2SYSV_STUB_END(resms64x_12) MS2SYSV_STUB_END(resms64x_13) MS2SYSV_STUB_END(resms64x_14) -- 2.15.0
Re: [PATCH, x86, libgcc] PR target/83917 Correct debug for -mcall-ms2sysv-xlogues stubs
On 01/19/2018 05:35 PM, Jakub Jelinek wrote: > On Fri, Jan 19, 2018 at 05:33:10PM -0600, Daniel Santos wrote: >> When stepping through tail-call restore stubs the debugger has to assume >> that rsp - 8 is the CFA, although it is not. This is because I did not >> explicitly add any .cfi directives. This patch adds them to the >> tail-call restore stubs, but this is new territory for me, so I would >> appreciate feedback. >> >> I've reg-tested on x86_64, but I still need to test on Solaris and >> Darwin. OK to commit after those tests? > I think you can't assume that the assembler supports .cfi_* directives. > While e.g. libgcc/config/i386/morestack.S uses them unconditionally, > it is guarded with: > if test "$libgcc_cv_cfi" = "yes"; then > tmake_file="${tmake_file} t-stack i386/t-stack-i386" > fi Ah hah! That explains a lot. Yeah, I wasn't thinking all assemblers would support it but I saw them in the Solaris assembler manual and figured that they were maybe more widely supported than I had thought. > in config.host. E.g. cygwin.S has: > #ifdef HAVE_GAS_CFI_SECTIONS_DIRECTIVE > .cfi_sections .debug_frame > # define cfi_startproc().cfi_startproc > # define cfi_endproc() .cfi_endproc > # define cfi_adjust_cfa_offset(X) .cfi_adjust_cfa_offset X > # define cfi_def_cfa_register(X).cfi_def_cfa_register X > # define cfi_register(D,S) .cfi_register D, S > # ifdef __x86_64__ > # define cfi_push(X) .cfi_adjust_cfa_offset 8; .cfi_rel_offset X, 0 > # define cfi_pop(X).cfi_adjust_cfa_offset -8; .cfi_restore X > # else > # define cfi_push(X) .cfi_adjust_cfa_offset 4; .cfi_rel_offset X, 0 > # define cfi_pop(X).cfi_adjust_cfa_offset -4; .cfi_restore X > # endif > #else > # define cfi_startproc() > # define cfi_endproc() > # define cfi_adjust_cfa_offset(X) > # define cfi_def_cfa_register(X) > # define cfi_register(D,S) > # define cfi_push(X) > # define cfi_pop(X) > #endif /* HAVE_GAS_CFI_SECTIONS_DIRECTIVE */ > perhaps you need something similar or commonize that (though, without > .cfi_sections, you want the default). > > Jakub Thanks. I like the idea of commonizing the macros for consistency. As far as adding tests, I guess I would need to dig into lib/gcc-gdb-test.exp to figure out how to do that. Daniel
Re: [PATCH] Correct debug for -mcall-ms2sysv-xlogues stubs (PR target/83917, take 2)
Sorry for the dropping the ball on this and thank you Jakub for stepping in! I've had a patch set sort-of rotting in my local repo, but I like yours better. I think I had gotten hung up on trying to figure out how to write a test for this, and like you I just tested mine manually in gdb. I do have one correction though. On 02/22/2018 08:56 AM, Jakub Jelinek wrote: > Hi! > > On Sat, Jan 20, 2018 at 06:01:16PM -0600, Daniel Santos wrote: >> Thanks. I like the idea of commonizing the macros for consistency. > Didn't see a progress on this P3 for a while, so I've written this > version of the patch; no tests though, what I've been using in testing was: > /* { dg-do compile { target lp64 } } */ > /* { dg-options "-mno-avx -msse2 -mcall-ms2sysv-xlogues -O2" } */ > > void __attribute__((sysv_abi, noipa)) > foo (void) > { > } > > static void __attribute__((sysv_abi)) (*volatile foop) () = foo; > > void __attribute__((ms_abi, noipa)) > bar (void) > { > foop (); > } > > int > main () > { > bar (); > return 0; > } > > with/without -fno-omit-frame-pointer, disas bar; b on the tail > call in there, stepi; bt (which before the patch failed, now works), > also up; p $rbp to see if %rbp has been properly declared to be saved. > There is no need to cfi_startproc/cfi_endproc for every single entrypoint in > there, it is enough if the whole range is covered. On the other side > we need the cfi_offset for the frame pointer case, otherwise up; p/x $rbp > doesn't work properly. > > Ok for trunk if it passes bootstrap/regtest on x86_64-linux and i686-linux? > > 2018-02-22 Jakub Jelinek > > PR debug/83917 > * config/i386/i386-asm.h (PACKAGE_VERSION, PACKAGE_NAME, > PACKAGE_STRING, PACKAGE_TARNAME, PACKAGE_URL): Undefine between > inclusion of auto-target.h and auto-host.h. > (USE_GAS_CFI_DIRECTIVES): Define if not defined already based on > __GCC_HAVE_DWARF2_CFI_ASM. > (cfi_startproc, cfi_endproc, cfi_adjust_cfa_offset, > cfi_def_cfa_register, cfi_def_cfa, cfi_register, cfi_offset, cfi_push, > cfi_pop): Define. > * config/i386/cygwin.S: Don't include auto-host.h here, just > define USE_GAS_CFI_DIRECTIVES to 1 or 0 and include i386-asm.h. > (cfi_startproc, cfi_endproc, cfi_adjust_cfa_offset, > cfi_def_cfa_register, cfi_register, cfi_push, cfi_pop): Remove. > * config/i386/resms64fx.h: Add cfi_* directives. > * config/i386/resms64x.h: Likewise. > > --- libgcc/config/i386/i386-asm.h.jj 2018-01-03 10:42:56.317763517 +0100 > +++ libgcc/config/i386/i386-asm.h 2018-02-22 15:33:43.812922298 +0100 > @@ -27,8 +27,47 @@ see the files COPYING3 and COPYING.RUNTI > #define I386_ASM_H > > #include "auto-target.h" > +#undef PACKAGE_VERSION > +#undef PACKAGE_NAME > +#undef PACKAGE_STRING > +#undef PACKAGE_TARNAME > +#undef PACKAGE_URL This is a beautiful, temporary(?) fix to an ugly problem! > #include "auto-host.h" > > +#ifndef USE_GAS_CFI_DIRECTIVES > +# ifdef __GCC_HAVE_DWARF2_CFI_ASM > +# define USE_GAS_CFI_DIRECTIVES 1 > +# else > +# define USE_GAS_CFI_DIRECTIVES 0 > +# endif > +#endif > +#if USE_GAS_CFI_DIRECTIVES > +# define cfi_startproc() .cfi_startproc > +# define cfi_endproc() .cfi_endproc > +# define cfi_adjust_cfa_offset(X).cfi_adjust_cfa_offset X > +# define cfi_def_cfa_register(X) .cfi_def_cfa_register X > +# define cfi_def_cfa(R,O).cfi_def_cfa R, O > +# define cfi_register(D,S) .cfi_register D, S > +# define cfi_offset(R,O) .cfi_offset R, O > +# ifdef __x86_64__ > +# define cfi_push(X).cfi_adjust_cfa_offset 8; > .cfi_rel_offset X, 0 > +# define cfi_pop(X) .cfi_adjust_cfa_offset -8; .cfi_restore X > +# else > +# define cfi_push(X).cfi_adjust_cfa_offset 4; > .cfi_rel_offset X, 0 > +# define cfi_pop(X) .cfi_adjust_cfa_offset -4; .cfi_restore X > +# endif > +#else > +# define cfi_startproc() > +# define cfi_endproc() > +# define cfi_adjust_cfa_offset(X) > +# define cfi_def_cfa_register(X) > +# define cfi_def_cfa(R,O) > +# define cfi_register(D,S) > +# define cfi_offset(R,O) > +# define cfi_push(X) > +# define cfi_pop(X) > +#endif > + > #define PASTE2(a, b) PASTE2a(a, b) > #define PASTE2a(a, b) a ## b > > --- libgcc/config/i386/cygwin.S.jj2018-01-03 10:42:56.309763515 +0100 > +++ libgcc/config/i386/cygwin.S 2018-02-22 15:30:34.597925496 +0100 > @@ -23,31 +23,13 @@ > * <http://www.gnu.org/licenses/>. > */ > > -#include "auto-host.h" The following
Re: [PATCH] Correct debug for -mcall-ms2sysv-xlogues stubs (PR target/83917, take 2)
On 02/26/2018 02:20 AM, Jakub Jelinek wrote: > On Sun, Feb 25, 2018 at 05:56:28PM -0600, Daniel Santos wrote: >>> --- libgcc/config/i386/i386-asm.h.jj2018-01-03 10:42:56.317763517 >>> +0100 >>> +++ libgcc/config/i386/i386-asm.h 2018-02-22 15:33:43.812922298 +0100 >>> @@ -27,8 +27,47 @@ see the files COPYING3 and COPYING.RUNTI >>> #define I386_ASM_H >>> >>> #include "auto-target.h" >>> +#undef PACKAGE_VERSION >>> +#undef PACKAGE_NAME >>> +#undef PACKAGE_STRING >>> +#undef PACKAGE_TARNAME >>> +#undef PACKAGE_URL >> This is a beautiful, temporary(?) fix to an ugly problem! >> >>> #include "auto-host.h" >>> --- libgcc/config/i386/cygwin.S.jj 2018-01-03 10:42:56.309763515 +0100 >>> +++ libgcc/config/i386/cygwin.S 2018-02-22 15:30:34.597925496 +0100 >>> @@ -23,31 +23,13 @@ >>> * <http://www.gnu.org/licenses/>. >>> */ >>> >>> -#include "auto-host.h" >> The following include should be here. >> >> +#include "i386-asm.h" > I don't understand this. i386-asm.h needs (both before my patch and after > it) both auto-host.h and auto-target.h, as it tests > HAVE_GAS_SECTIONS_DIRECTIVE (this one newly, comes from cygwin.S) The problem is that HAVE_GAS_SECTIONS_DIRECTIVE gets defined (or not) in ../../gcc/auto-host.h, but you are testing it before including auto-host.h, either directly or via i386-asm.h. So if i386-asm.h depends upon HAVE_GAS_SECTIONS_DIRECTIVE first being defined then it is a circular dependency. In its current form, cygwin.S would never define USE_GAS_CFI_DIRECTIVES prior to including i386-asm.h and also never emit .cfi_sections .debug_frame and rather or not USE_GAS_CFI_DIRECTIVES ends up being defined to 1 or 0 depends upon the test of __GCC_HAVE_DWARF2_CFI_ASM in i386-asm.h. So this area is new for me, but I don't understand why we're testing HAVE_GAS_SECTIONS_DIRECTIVE in cygwin.S and __GCC_HAVE_DWARF2_CFI_ASM when included from one of the stubs. Is this an error, or a lack of my understanding or both? :) > HAVE_GAS_HIDDEN > macros defined in auto-host.h > and > HAVE_AS_AVX > macro defined in auto-target.h. > Including auto-host.h when i386-asm.h will include it again just doesn't > work, these headers don't have multiple inclusion guards. And only including > auto-target.h will work only if the > .hidden > and > .cfi_sections .debug_frame > tests are duplicated from gcc/configure.ac to libgcc/configure.ac, then we > could include just auto-target.h in i386-asm.h. > I've just followed what i386-asm.h has been doing. And it's possible that I failed to test something correctly before presuming it to be available, although I *think* the test for .hidden is good. > > Jakub > Thanks for your work on this. If we need to test for CFI directives differently when being included from cygwin.S, maybe we can just define a simple cpp macro to indicate this and let i386-asm.h encapsulate the implementation of it (e.g., testing HAVE_GAS_SECTIONS_DIRECTIVE or __GCC_HAVE_DWARF2_CFI_ASM as appropriate). Ultimately, the proper cleanup will be moving these tests out of {gcc,libgcc}/configure.ac and into .m4 files in the root config directory so that we don't uglify them with massive copy & pastes. These tests are also fairly complex as there are a lot of dependencies. m4 isn't my strong suite, but I can look at this after we're out of code freeze. Daniel
Re: [PATCH] Fix the GNU Stack markings on libgcc.a
Hello On 05/01/2018 06:32 AM, Magnus Granberg wrote: > New patch > libgcc/ChangeLog: > > 2018-05-01 Magnus Granberg > > * config/i386/resms64.h: Add .note.GNU-stack section > * config/i386/resms64f.h: Likewise. > * config/i386/resms64fx.h: Likewise. > * config/i386/resms64x.h: Likewise. > * config/i386/savms64.h: Likewise. > * config/i386/savms64f.h: Likewise. > > --- Well this isn't correct either because you are outside of the inclusion guard. Can you please move this up a line? Thanks, Daniel
Re: [PATCH] Fix the GNU Stack markings on libgcc.a
On 05/02/2018 06:17 PM, Magnus Granberg wrote: > torsdag 3 maj 2018 kl. 01:07:51 CEST skrev Daniel Santos: >> Hello >> >> On 05/01/2018 06:32 AM, Magnus Granberg wrote: >>> New patch >>> libgcc/ChangeLog: >>> >>> 2018-05-01 Magnus Granberg >>> >>> * config/i386/resms64.h: Add .note.GNU-stack section >>> * config/i386/resms64f.h: Likewise. >>> * config/i386/resms64fx.h: Likewise. >>> * config/i386/resms64x.h: Likewise. >>> * config/i386/savms64.h: Likewise. >>> * config/i386/savms64f.h: Likewise. >>> >>> --- >> Well this isn't correct either because you are outside of the inclusion >> guard. Can you please move this up a line? >> >> Thanks, >> Daniel > /libgcc/ChangeLog: > 2018-05-01 Magnus Granberg > > * config/i386/resms64.h: Add .note.GNU-stack section > * config/i386/resms64f.h: Likewise. > * config/i386/resms64fx.h: Likewise. > * config/i386/resms64x.h: Likewise. > * config/i386/savms64.h: Likewise. > * config/i386/savms64f.h: Likewise. > > --- No, I meant to move the changes up a line so that, if for some reason the header was included twice, that it wouldn't output the section twice. Example: MS2SYSV_STUB_END(savms64_18) +#if·defined(__linux__)·&&·defined(__ELF__) +.section·.note.GNU-stack,"",%progbits +#endif #endif·/*·__x86_64__·*/ But upon further reflection, I think it can be cleanly added to i386-asm.h. Does that look sane Jakub? (I haven't tried it) Also, for the sake of my education, I don't exactly understand what the problem is as I haven't been keeping up with pax and hardening. I just want to clarify that the stack shouldn't be executable. These are not actual "functions" per-se (i.e., they do not adhere to any ABI), they operate on the stack of the calling function. Thanks, Daniel diff --git a/libgcc/config/i386/i386-asm.h b/libgcc/config/i386/i386-asm.h index 267133a9b75..7eb3c12fc85 100644 --- a/libgcc/config/i386/i386-asm.h +++ b/libgcc/config/i386/i386-asm.h @@ -80,6 +80,10 @@ ASMNAME(fn): #ifdef MS2SYSV_STUB_PREFIX +# if·defined(__linux__)·&&·defined(__ELF__) +.section·.note.GNU-stack,"",%progbits +# endif + # define MS2SYSV_STUB_BEGIN(base_name) \ HIDDEN_FUNC(PASTE2(MS2SYSV_STUB_PREFIX, base_name))
[PATCH] [testsuite/i386] PR 82268 Correct FAIL when configured --with-cpu
When I originally wrote this test I wasn't wasn't aware of the --with-cpu configure option, so this change explicitly disables avx to make sure we choose the sse implementation, even when --with-cpu specifies an arch that has avx support. OK for head? gcc/testsuite/ChangeLog: gcc.target/i386/pr82196-1.c (dg-options): Add -mno-avx. Thanks, Daniel --- gcc/testsuite/gcc.target/i386/pr82196-1.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/i386/pr82196-1.c b/gcc/testsuite/gcc.target/i386/pr82196-1.c index 541d975480d..ff108132bb5 100644 --- a/gcc/testsuite/gcc.target/i386/pr82196-1.c +++ b/gcc/testsuite/gcc.target/i386/pr82196-1.c @@ -1,5 +1,5 @@ /* { dg-do compile { target lp64 } } */ -/* { dg-options "-msse -mcall-ms2sysv-xlogues -O2" } */ +/* { dg-options "-mno-avx -msse -mcall-ms2sysv-xlogues -O2" } */ /* { dg-final { scan-assembler "call.*__sse_savms64f?_12" } } */ /* { dg-final { scan-assembler "jmp.*__sse_resms64f?x_12" } } */ -- 2.14.3
[PATCH 0/2] [i386] PR82002 Correct ICE with large stack frame
I originally intended to submit the first part of this patch set a few weeks ago as it was simpler, but here is the full fix. The first part is a really simple follow-up fix to an off-by-one error H.J. originally fixed with r252099, but in the process of testing I discovered a more complex problem when we add a ms_abi to sysv_abi call that resulted in a bad INSN because I didn't check for a non-immediate offset. I originally wrote a different solution where I added a mechanism to struct ix86_frame to track and reuse a scratch register in the pro/epilogue, but then I realized that I didn't need that if I just emitted the SSE saves or stub call after the SP realignment and prior to allocating the remainder of the frame. However, I still need to use a scratch register sometimes in the epilogue, so I've added a simplified mechanism to choose_baseaddr to manage that, but not to track and reuse it for subsequent calls. Unfortunately, this sat for so long that there's two duplicates in Bugzilla now (pr82485 and pr82712). Regression tested with {,-m32} and I've started one for x32 even though it *shouldn't* affect it (in theory). Thanks, Daniel
[PATCH 1/2] [i386] PR82002 Part 1: Correct ICE caused by wrong calculation.
This is a residual problem caused by the off-by-one error in sp_valid_at and fp_valid_at originally corrected in r252099. However, adding tests that include an ms_abi to sysv_abi call reveals an additional, more complex problem with an invalid INSN due to overflowing the s32 offset. Therefore I'm including all new tests, but marking ones that are broken by this additional problem as xfail and addressing that in the next patch. gcc: config/i386/i386.c (ix86_expand_epilogue): Correct stack calculation. gcc/testsuite: gcc.target/i386/pr82002-1.c: New test. gcc.target/i386/pr82002-2a.c: New xfail test. gcc.target/i386/pr82002-2b.c: New xfail test. Signed-off-by: Daniel Santos --- gcc/config/i386/i386.c | 2 +- gcc/testsuite/gcc.target/i386/pr82002-1.c | 12 gcc/testsuite/gcc.target/i386/pr82002-2a.c | 14 ++ gcc/testsuite/gcc.target/i386/pr82002-2b.c | 14 ++ 4 files changed, 41 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr82002-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr82002-2a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr82002-2b.c diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2de0dd0c283..83a07afb3e1 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -13812,7 +13812,7 @@ ix86_expand_epilogue (int style) the stack pointer, if we will restore SSE regs via sp. */ if (TARGET_64BIT && m->fs.sp_offset > 0x7fff - && sp_valid_at (frame.stack_realign_offset) + && sp_valid_at (frame.stack_realign_offset + 1) && (frame.nsseregs + frame.nregs) != 0) { pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, diff --git a/gcc/testsuite/gcc.target/i386/pr82002-1.c b/gcc/testsuite/gcc.target/i386/pr82002-1.c new file mode 100644 index 000..86678a01992 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr82002-1.c @@ -0,0 +1,12 @@ +/* { dg-do compile { target lp64 } } */ +/* { dg-options "-Ofast -mstackrealign -mabi=ms" } */ + +void a (char *); +void +b () +{ + char c[100]; + c[1099511627776] = 'b'; + a (c); + a (c); +} diff --git a/gcc/testsuite/gcc.target/i386/pr82002-2a.c b/gcc/testsuite/gcc.target/i386/pr82002-2a.c new file mode 100644 index 000..bc85080ba8e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr82002-2a.c @@ -0,0 +1,14 @@ +/* { dg-do compile { target lp64 } } */ +/* { dg-options "-Ofast -mstackrealign -mabi=ms" } */ +/* { dg-xfail-if "" { *-*-* } } */ +/* { dg-xfail-run-if "" { *-*-* } } */ + +void __attribute__((sysv_abi)) a (char *); +void +b () +{ + char c[100]; + c[1099511627776] = 'b'; + a (c); + a (c); +} diff --git a/gcc/testsuite/gcc.target/i386/pr82002-2b.c b/gcc/testsuite/gcc.target/i386/pr82002-2b.c new file mode 100644 index 000..10e44cd7b1d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr82002-2b.c @@ -0,0 +1,14 @@ +/* { dg-do compile { target lp64 } } */ +/* { dg-options "-Ofast -mstackrealign -mabi=ms -mcall-ms2sysv-xlogues" } */ +/* { dg-xfail-if "" { *-*-* } } */ +/* { dg-xfail-run-if "" { *-*-* } } */ + +void __attribute__((sysv_abi)) a (char *); +void +b () +{ + char c[100]; + c[1099511627776] = 'b'; + a (c); + a (c); +} -- 2.14.3
[PATCH 2/2] [i386] PR82002 Part 2: Correct non-immediate offset/invalid INSN
When we are realigning the stack pointer, making an ms_abi to sysv_abi call and alllocating 2GiB or more on the stack we end up with an invalid INSN due to a non-immediate offset. This occurs both with and without -mcall-ms2sysv-xlogues. Additionally, I've discovered that the stack allocation with -mcall-ms2sysv-xlogues is incorrect as it ignores stack checking, stack clash checking and probing. This patch fixes these problems by 1. No longer allocate stack space in ix86_emit_outlined_ms2sysv_save. 2. Rearrange where we emit SSE saves or stub call: a. Before frame allocation when offset from frame to save area is >= 2GiB. b. After frame allocation when frame is < 2GiB. (Stack allocations prior to the stub call can't be combined with those afterwards, so this is better when possible.) 3. Modify choose_baseaddr to take an optional scratch_regno argument and never return rtx that cannot be used as an immediate. gcc: config/i386/i386.c (choose_basereg): Use optional scratch register and add assertion. (x86_emit_outlined_ms2sysv_save): use scratch register when needed, and don't allocate stack. (ix86_expand_prologue): Rearrange where SSE saves/stub call is emitted, correct wrong allocation with -mcall-ms2sysv-xlogues. (ix86_emit_outlined_ms2sysv_restore): Fix non-immediate offsets. gcc/testsuite: gcc.target/i386/pr82002-2a.c: Change from xfail to fail. gcc.target/i386/pr82002-2b.c: Likewise. Signed-off-by: Daniel Santos --- gcc/config/i386/i386.c | 76 -- gcc/testsuite/gcc.target/i386/pr82002-2a.c | 2 - gcc/testsuite/gcc.target/i386/pr82002-2b.c | 2 - 3 files changed, 62 insertions(+), 18 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 83a07afb3e1..abd8e937e0d 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -11520,7 +11520,8 @@ choose_basereg (HOST_WIDE_INT cfa_offset, rtx &base_reg, The valid base registers are taken from CFUN->MACHINE->FS. */ static rtx -choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align) +choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align, +int scratch_regno = -1) { rtx base_reg = NULL; HOST_WIDE_INT base_offset = 0; @@ -11534,6 +11535,28 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align) choose_basereg (cfa_offset, base_reg, base_offset, 0, align); gcc_assert (base_reg != NULL); + + if (TARGET_64BIT) +{ + rtx base_offset_rtx = GEN_INT (base_offset); + + if (scratch_regno >= 0) + { + if (!x86_64_immediate_operand (base_offset_rtx, DImode)) + { + rtx tmp; + rtx scratch_reg = gen_rtx_REG (DImode, scratch_regno); + + emit_insn (gen_rtx_SET (scratch_reg, base_offset_rtx)); + tmp = gen_rtx_PLUS (DImode, scratch_reg, base_reg); + emit_insn (gen_rtx_SET (scratch_reg, tmp)); + return scratch_reg; + } + } + else + gcc_assert (x86_64_immediate_operand (base_offset_rtx, DImode)); +} + return plus_constant (Pmode, base_reg, base_offset); } @@ -12793,23 +12816,22 @@ ix86_emit_outlined_ms2sysv_save (const struct ix86_frame &frame) rtx sym, addr; rtx rax = gen_rtx_REG (word_mode, AX_REG); const struct xlogue_layout &xlogue = xlogue_layout::get_instance (); - HOST_WIDE_INT allocate = frame.stack_pointer_offset - m->fs.sp_offset; /* AL should only be live with sysv_abi. */ gcc_assert (!ix86_eax_live_at_start_p ()); + gcc_assert (m->fs.sp_offset >= frame.sse_reg_save_offset); /* Setup RAX as the stub's base pointer. We use stack_realign_offset rather we've actually realigned the stack or not. */ align = GET_MODE_ALIGNMENT (V4SFmode); addr = choose_baseaddr (frame.stack_realign_offset - + xlogue.get_stub_ptr_offset (), &align); + + xlogue.get_stub_ptr_offset (), &align, AX_REG); gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode)); - emit_insn (gen_rtx_SET (rax, addr)); - /* Allocate stack if not already done. */ - if (allocate > 0) - pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, - GEN_INT (-allocate), -1, false); + /* If choose_baseaddr returned our scratch register, then we don't need to + do another SET. */ + if (!REG_P (addr) || REGNO (addr) != AX_REG) +emit_insn (gen_rtx_SET (rax, addr)); /* Get the stub symbol. */ sym = xlogue.get_stub_rtx (frame_pointer_needed ? XLOGUE_STUB_SAVE_HFP @@ -12841,6 +12863,7 @@ ix86_expand_prologue (void) HOST_WIDE_INT allocate; bool int_registers_saved; bool sse_registers_saved; + bool save_stub_call_needed; rtx static_chain = NULL_RTX; if (ix86_function_n
Re: [PATCH 2/2] [i386] PR82002 Part 2: Correct non-immediate offset/invalid INSN
On 10/30/2017 09:09 PM, Daniel Santos wrote: > 3. Modify choose_baseaddr to take an optional scratch_regno argument >and never return rtx that cannot be used as an immediate. I should amend this, it actually does a gcc_assert, so that won't happen if --enable-checking=no, but it would still fail later in expand. > static rtx > -choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align) > +choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align, > + int scratch_regno = -1) > { >rtx base_reg = NULL; >HOST_WIDE_INT base_offset = 0; > @@ -11534,6 +11535,28 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned > int *align) > choose_basereg (cfa_offset, base_reg, base_offset, 0, align); > >gcc_assert (base_reg != NULL); > + > + if (TARGET_64BIT) > +{ > + rtx base_offset_rtx = GEN_INT (base_offset); > + > + if (scratch_regno >= 0) > + { > + if (!x86_64_immediate_operand (base_offset_rtx, DImode)) > + { > + rtx tmp; > + rtx scratch_reg = gen_rtx_REG (DImode, scratch_regno); > + > + emit_insn (gen_rtx_SET (scratch_reg, base_offset_rtx)); > + tmp = gen_rtx_PLUS (DImode, scratch_reg, base_reg); > + emit_insn (gen_rtx_SET (scratch_reg, tmp)); > + return scratch_reg; > + } > + } > + else > + gcc_assert (x86_64_immediate_operand (base_offset_rtx, DImode)); > +} > + >return plus_constant (Pmode, base_reg, base_offset); > } Daniel
Re: [PATCH 2/2] [i386] PR82002 Part 2: Correct non-immediate offset/invalid INSN
On 10/31/2017 04:31 AM, Uros Bizjak wrote: > On Tue, Oct 31, 2017 at 3:09 AM, Daniel Santos > wrote: >> When we are realigning the stack pointer, making an ms_abi to sysv_abi >> call and alllocating 2GiB or more on the stack we end up with an invalid >> INSN due to a non-immediate offset. This occurs both with and without >> -mcall-ms2sysv-xlogues. Additionally, I've discovered that the stack >> allocation with -mcall-ms2sysv-xlogues is incorrect as it ignores stack >> checking, stack clash checking and probing. >> >> This patch fixes these problems by >> >> 1. No longer allocate stack space in ix86_emit_outlined_ms2sysv_save. >> 2. Rearrange where we emit SSE saves or stub call: >>a. Before frame allocation when offset from frame to save area is >= 2GiB. >>b. After frame allocation when frame is < 2GiB. (Stack allocations >> prior to the stub call can't be combined with those afterwards, so >> this is better when possible.) >> 3. Modify choose_baseaddr to take an optional scratch_regno argument >>and never return rtx that cannot be used as an immediate. >> >> gcc: >> config/i386/i386.c (choose_basereg): Use optional scratch >> register and add assertion. >> (x86_emit_outlined_ms2sysv_save): use scratch register when >> needed, and don't allocate stack. >> (ix86_expand_prologue): Rearrange where SSE saves/stub call is >> emitted, correct wrong allocation with -mcall-ms2sysv-xlogues. >> (ix86_emit_outlined_ms2sysv_restore): Fix non-immediate offsets. >> >> gcc/testsuite: >> gcc.target/i386/pr82002-2a.c: Change from xfail to fail. >> gcc.target/i386/pr82002-2b.c: Likewise. >> >> Signed-off-by: Daniel Santos >> --- >> gcc/config/i386/i386.c | 76 >> -- >> gcc/testsuite/gcc.target/i386/pr82002-2a.c | 2 - >> gcc/testsuite/gcc.target/i386/pr82002-2b.c | 2 - >> 3 files changed, 62 insertions(+), 18 deletions(-) >> >> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c >> index 83a07afb3e1..abd8e937e0d 100644 >> --- a/gcc/config/i386/i386.c >> +++ b/gcc/config/i386/i386.c >> @@ -11520,7 +11520,8 @@ choose_basereg (HOST_WIDE_INT cfa_offset, rtx >> &base_reg, >> The valid base registers are taken from CFUN->MACHINE->FS. */ >> >> static rtx >> -choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align) >> +choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align, >> +int scratch_regno = -1) >> { >>rtx base_reg = NULL; >>HOST_WIDE_INT base_offset = 0; >> @@ -11534,6 +11535,28 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned >> int *align) >> choose_basereg (cfa_offset, base_reg, base_offset, 0, align); >> >>gcc_assert (base_reg != NULL); >> + >> + if (TARGET_64BIT) >> +{ >> + rtx base_offset_rtx = GEN_INT (base_offset); >> + >> + if (scratch_regno >= 0) >> + { >> + if (!x86_64_immediate_operand (base_offset_rtx, DImode)) >> + { >> + rtx tmp; >> + rtx scratch_reg = gen_rtx_REG (DImode, scratch_regno); >> + >> + emit_insn (gen_rtx_SET (scratch_reg, base_offset_rtx)); >> + tmp = gen_rtx_PLUS (DImode, scratch_reg, base_reg); >> + emit_insn (gen_rtx_SET (scratch_reg, tmp)); >> + return scratch_reg; >> + } >> + } >> + else >> + gcc_assert (x86_64_immediate_operand (base_offset_rtx, DImode)); >> +} >> + >>return plus_constant (Pmode, base_reg, base_offset); >> } > This function doesn't need to return a register, it can return plus > RTX. I'd suggest the following implementation: > > --cut here-- > Index: i386.c > === > --- i386.c (revision 254243) > +++ i386.c (working copy) > @@ -11520,7 +11520,8 @@ > The valid base registers are taken from CFUN->MACHINE->FS. */ > > static rtx > -choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align) > +choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align, > +unsigned int scratch_regno = INVALID_REGNUM) > { >rtx base_reg = NULL; >HOST_WIDE_INT base_offset = 0; > @@ -11534,6 +11535,19 @@ > choose_basereg (cfa_offset, base_reg, base_offset, 0, align); > >gcc_assert (base_reg != NULL); >
Re: [PATCH 2/2] [i386] PR82002 Part 2: Correct non-immediate offset/invalid INSN
On 11/03/2017 02:09 AM, Uros Bizjak wrote: > On Thu, Nov 2, 2017 at 11:43 PM, Daniel Santos > wrote: > >>>>int_registers_saved = (frame.nregs == 0); >>>>sse_registers_saved = (frame.nsseregs == 0); >>>> + save_stub_call_needed = (m->call_ms2sysv); >>>> + gcc_assert (!(!sse_registers_saved && save_stub_call_needed)); >>> Oooh, double negation :( >> I'm just saying that we shouldn't be saving SSE registers inline and via >> the stub. If I followed the naming convention of e.g., >> "see_registers_saved" then my variable would end up being called >> "save_stub_called" which would be incorrect and misleading, similar to >> how "see_registers_saved" is misleading when there are in fact no SSE >> register that need to be saved. Maybe I should rename >> (int|sse)_registers_saved to (int|sse)_register_saves_needed with >> inverted logic instead. > But, we can just say > > gcc_assert (sse_registers_saved || !save_stub_call_needed); > > No? > > Uros. > Oh yes, I see. Because "sse_registers_saved" really means that we've either already saved them or don't have to, and not literally that they have been saved. I ranted about it's name but didn't think it all the way through. :) How does this patch look? (Also, I've updated comments for choose_baseaddr.) Currently re-running tests. Thanks, Daniel diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2967876..fb81d4dba84 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -11515,12 +11515,15 @@ choose_basereg (HOST_WIDE_INT cfa_offset, rtx &base_reg, an alignment value (in bits) that is preferred or zero and will recieve the alignment of the base register that was selected, irrespective of rather or not CFA_OFFSET is a multiple of that - alignment value. + alignment value. If it is possible for the base register offset to be + non-immediate then SCRATCH_REGNO should specify a scratch register to + use. The valid base registers are taken from CFUN->MACHINE->FS. */ static rtx -choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align) +choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align, + unsigned int scratch_regno = INVALID_REGNUM) { rtx base_reg = NULL; HOST_WIDE_INT base_offset = 0; @@ -11534,6 +11537,19 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align) choose_basereg (cfa_offset, base_reg, base_offset, 0, align); gcc_assert (base_reg != NULL); + + rtx base_offset_rtx = GEN_INT (base_offset); + + if (!x86_64_immediate_operand (base_offset_rtx, Pmode)) +{ + gcc_assert (scratch_regno != INVALID_REGNUM); + + rtx scratch_reg = gen_rtx_REG (Pmode, scratch_regno); + emit_move_insn (scratch_reg, base_offset_rtx); + + return gen_rtx_PLUS (Pmode, base_reg, scratch_reg); +} + return plus_constant (Pmode, base_reg, base_offset); } @@ -12793,23 +12809,19 @@ ix86_emit_outlined_ms2sysv_save (const struct ix86_frame &frame) rtx sym, addr; rtx rax = gen_rtx_REG (word_mode, AX_REG); const struct xlogue_layout &xlogue = xlogue_layout::get_instance (); - HOST_WIDE_INT allocate = frame.stack_pointer_offset - m->fs.sp_offset; /* AL should only be live with sysv_abi. */ gcc_assert (!ix86_eax_live_at_start_p ()); + gcc_assert (m->fs.sp_offset >= frame.sse_reg_save_offset); /* Setup RAX as the stub's base pointer. We use stack_realign_offset rather we've actually realigned the stack or not. */ align = GET_MODE_ALIGNMENT (V4SFmode); addr = choose_baseaddr (frame.stack_realign_offset - + xlogue.get_stub_ptr_offset (), &align); + + xlogue.get_stub_ptr_offset (), &align, AX_REG); gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode)); - emit_insn (gen_rtx_SET (rax, addr)); - /* Allocate stack if not already done. */ - if (allocate > 0) - pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, -GEN_INT (-allocate), -1, false); + emit_insn (gen_rtx_SET (rax, addr)); /* Get the stub symbol. */ sym = xlogue.get_stub_rtx (frame_pointer_needed ? XLOGUE_STUB_SAVE_HFP @@ -12841,6 +12853,7 @@ ix86_expand_prologue (void) HOST_WIDE_INT allocate; bool int_registers_saved; bool sse_registers_saved; + bool save_stub_call_needed; rtx static_chain = NULL_RTX; if (ix86_function_naked (current_function_decl)) @@ -13016,6 +13029,8 @@ ix86_expand_prologue (void) int_registers_saved = (frame.nregs == 0); sse_registers_saved = (frame.nsseregs == 0); + save_stub_call_needed = (m->call_ms2sysv); + gcc_assert (sse_registers_saved || !save_stub_call_needed); if (frame_pointer_needed && !m->fs.fp_valid) { @@ -13110,10 +13125,26 @@ ix86_expand_prolog
Re: [PATCH 2/2] [i386] PR82002 Part 2: Correct non-immediate offset/invalid INSN
On 11/03/2017 04:22 PM, Daniel Santos wrote: > ... > How does this patch look? (Also, I've updated comments for > choose_baseaddr.) Currently re-running tests. > > Thanks, > Daniel > > @@ -13110,10 +13125,26 @@ ix86_expand_prologue (void) >target. */ >if (TARGET_SEH) > m->fs.sp_valid = false; > -} > > - if (m->call_ms2sysv) > -ix86_emit_outlined_ms2sysv_save (frame); > + /* If SP offset is non-immediate after allocation of the stack frame, > + then emit SSE saves or stub call prior to allocating the rest of the > + stack frame. This is less efficient for the out-of-line stub because > + we can't combine allocations across the call barrier, but it's better > + than using a scratch register. */ > + else if (!x86_64_immediate_operand (GEN_INT > (frame.stack_pointer_offset - m->fs.sp_realigned_offset), Pmode)) Oops, and also after fixing this formatting... Daniel
PING: [PATCH v2 0/2] [testsuite, libgcc] PR80759 Fix FAILs on Solaris and Darwin
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00025.html Uros, Can you review changes for i386 please? Mike or Iain, Can one of you review changes for Darwin please? I'm not familiar with the platform, although Rainer tested on Darwin for me. Ian, Can you review changes to libgcc please? Thank you all! Daniel On 07/02/2017 12:11 AM, Daniel Santos wrote: This patchset addresses a number of testsuite issues for gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp, mostly occurring on Solaris and Darwin. Additionally, it solves a bug in libgcc that caused link failures on Darwin when building with -mcall-ms2sysv-xlogues. The issues are detailed in the notes for each patch. I would particularly appreciate any feedback for Darwin as I am unfamiliar with the platform and Rainer and I have fashioned some of these changes by looking at other Darwin code in gcc. .../gcc.target/x86_64/abi/ms-sysv/do-test.S | 200 --- .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.c | 83 +++- .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp| 153 +- libgcc/config.host | 6 +- libgcc/config/i386/i386-asm.h| 89 + libgcc/config/i386/resms64.S | 2 +- libgcc/config/i386/resms64f.S| 2 +- libgcc/config/i386/resms64fx.S | 2 +- libgcc/config/i386/resms64x.S| 2 +- libgcc/config/i386/savms64.S | 2 +- libgcc/config/i386/savms64f.S| 2 +- 11 files changed, 274 insertions(+), 269 deletions(-) Many thanks to Rainer for all of his help on this! Thanks, Daniel
Re: [PING] [PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues
On 07/26/2017 02:03 PM, H.J. Lu wrote: This patch caused: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81563 Yes, I discovered this flaw while working on PR 80969 but I hadn't found an actual testcase where it caused a problem yet. I'm about to submit my patchset for review, so sorry I didn't get it committed sooner. My patch set further improves sp_valid_at and fp_valid_at since it's possible that the the last offset the frame pointer can be used to access is not equal to realignment offset. I'll try to get this out tonight or tomorrow. Thanks! Daniel
Re: [PING] [PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues
On 07/26/2017 02:03 PM, H.J. Lu wrote: This patch caused: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81563 Hello. I've rebased my patch set and I'm now retesting. I'm afraid that your changes are wrong because my my sp_valid_at and fp_valid_at functions are wrong -- these are supposed to be for the base offset and not the CFA offset, sorry about that. This means that the check in choose_basereg (and thus choose_baseaddr) have been wrong as well. I'm retesting now.
Re: [PING] [PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues
On 07/28/2017 09:41 AM, H.J. Lu wrote: On Fri, Jul 28, 2017 at 6:57 AM, Daniel Santos wrote: On 07/26/2017 02:03 PM, H.J. Lu wrote: This patch caused: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81563 Hello. I've rebased my patch set and I'm now retesting. I'm afraid that your changes are wrong because my my sp_valid_at and fp_valid_at functions are wrong -- these are supposed to be for the base offset and not the CFA offset, sorry about that. This means that the check in choose_basereg (and thus choose_baseaddr) have been wrong as well. I'm retesting now. Please check your change with gcc.target/i386/pr81563.c. Thanks. I'm still getting used to x86 stack math and and briefly I thought that my understanding of the CFA was wrong and that I had messed up sp_valid_at and fp_valid_at, but I was mistaken, so sorry for the false alarm. My rebased patches pass all tests, so it's OK.
[PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f
When working on the Wine64 project to use aligned SSE MOVs after SP realignment and adding -mcall-ms2sysv-xlogues, I overlooked the fact that the function body may require a stack alignment greater than 16-bytes. This can result in an ICE with -mabi=ms -mavx512f and some other cases. This patch set reworks the strategy for calculating the frame layout following normal (inline) integral register saves (at frame.reg_save_offset) to the start of the frame for the local function (frame.frame_pointer_offset). I've completed a bootstrap and full regression test with no additional failures, but I don't have access to a machine with avx512 extensions. I have manually run the tests that need it using the Intel SDE, but I haven't been able to validate that my check_effective_target_avx512f_runtime code in gcc/testsuite/lib/target-supports.exp is correctly enabling the tests for pr80969-4*.c. As an aside note, I still have some rework of the ms-sysv.exp tests that I haven't yet to submitted and in which I'm adding more tests for cases with uncommon stacks, as in PR 81563. Thanks, Daniel 2017-07-23 Daniel Santos * config/i386/i386.h (ix86_frame::outlined_save_offset): Remove field. (ix86_frame::stack_realign_allocate_offset): Likewise. (ix86_frame::stack_realign_allocate): New field. (struct machine_frame_state): Modify comments. (machine_frame_state::sp_realigned_fp_end): New field. (machine_function::call_ms2sysv_pad_out): Remove field. * config/i386/i386.c (xlogue_layout::get_stack_space_used): Modify. (ix86_compute_frame_layout): Likewise. (sp_valid_at): Likewise. (fp_valid_at): Likewise. (choose_baseaddr): Modify comments. (ix86_emit_outlined_ms2sysv_save): Modify. (ix86_expand_prologue): Likewise. (ix86_expand_epilogue): Modify comments. 2017-07-23 Daniel Santos * gcc.target/i386/pr80969-1.c: New testcase. * gcc.target/i386/pr80969-2a.c: Likewise. * gcc.target/i386/pr80969-2.c: Likewise. * gcc.target/i386/pr80969-3.c: Likewise. * gcc.target/i386/pr80969-4a.c: Likewise. * gcc.target/i386/pr80969-4b.c: Likewise. * gcc.target/i386/pr80969-4.c: Likewise.
[PATCH 1/6] [i386] Correct comments, add assertions to sp_valid_at and fp_valid_at
When we realign the stack frame (without DRAP), there may be a range of CFA offsets that should never be touched because they are alignment padding and any reference to them is almost certainly an error. Previously, only the offset of where the realigned stack frame starts was recorded and checked in sp_valid_at and fp_valid_at. This change adds sp_realigned_fp_last to struct machine_frame_state to record the last valid offset from which the frame pointer can be used when the stack pointer is realigned and modifies sp_valid_at and fp_valid_at to fail an assertion when passed an offset in the "no-man's land" between these two values. Comments for struct machine_frame_state incorrectly stated that a realigned stack pointer could be used to access offsets equal to or greater than sp_realigned_offset, but it is only valid for offsets that are greater. This was the (incorrect) behaviour of sp_valid_at and fp_valid_at prior to r250587 and this change now corrects the documentation and adds clarification of the CFA-relative calculation. Signed-off-by: Daniel Santos --- gcc/config/i386/i386.c | 45 ++--- gcc/config/i386/i386.h | 18 +- 2 files changed, 43 insertions(+), 20 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index f1486ff3750..690631dfe43 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -13102,26 +13102,36 @@ choose_baseaddr_len (unsigned int regno, HOST_WIDE_INT offset) return len; } -/* Determine if the stack pointer is valid for accessing the cfa_offset. - The register is saved at CFA - CFA_OFFSET. */ +/* Determine if the stack pointer is valid for accessing the CFA_OFFSET in + the frame save area. The register is saved at CFA - CFA_OFFSET. */ -static inline bool +static bool sp_valid_at (HOST_WIDE_INT cfa_offset) { const struct machine_frame_state &fs = cfun->machine->fs; - return fs.sp_valid && !(fs.sp_realigned - && cfa_offset <= fs.sp_realigned_offset); + if (fs.sp_realigned && cfa_offset <= fs.sp_realigned_offset) +{ + /* Validate that the cfa_offset isn't in a "no-man's land". */ + gcc_assert (cfa_offset <= fs.sp_realigned_fp_last); + return false; +} + return fs.sp_valid; } -/* Determine if the frame pointer is valid for accessing the cfa_offset. - The register is saved at CFA - CFA_OFFSET. */ +/* Determine if the frame pointer is valid for accessing the CFA_OFFSET in + the frame save area. The register is saved at CFA - CFA_OFFSET. */ static inline bool fp_valid_at (HOST_WIDE_INT cfa_offset) { const struct machine_frame_state &fs = cfun->machine->fs; - return fs.fp_valid && !(fs.sp_valid && fs.sp_realigned - && cfa_offset > fs.sp_realigned_offset); + if (fs.sp_realigned && cfa_offset > fs.sp_realigned_fp_last) +{ + /* Validate that the cfa_offset isn't in a "no-man's land". */ + gcc_assert (cfa_offset >= fs.sp_realigned_offset); + return false; +} + return fs.fp_valid; } /* Choose a base register based upon alignment requested, speed and/or @@ -14560,6 +14570,9 @@ ix86_expand_prologue (void) int align_bytes = crtl->stack_alignment_needed / BITS_PER_UNIT; gcc_assert (align_bytes > MIN_STACK_BOUNDARY / BITS_PER_UNIT); + /* Record last valid frame pointer offset. */ + m->fs.sp_realigned_fp_last = m->fs.sp_offset; + /* The computation of the size of the re-aligned stack frame means that we must allocate the size of the register save area before performing the actual alignment. Otherwise we cannot guarantee @@ -14573,13 +14586,15 @@ ix86_expand_prologue (void) insn = emit_insn (ix86_gen_andsp (stack_pointer_rtx, stack_pointer_rtx, GEN_INT (-align_bytes))); - /* For the purposes of register save area addressing, the stack -pointer can no longer be used to access anything in the frame -below m->fs.sp_realigned_offset and the frame pointer cannot be -used for anything at or above. */ m->fs.sp_offset = ROUND_UP (m->fs.sp_offset, align_bytes); m->fs.sp_realigned = true; m->fs.sp_realigned_offset = m->fs.sp_offset - frame.nsseregs * 16; + /* The stack pointer may no longer be equal to CFA - m->fs.sp_offset. +Beyond this point, stack access should be done via choose_baseaddr or +by using sp_valid_at and fp_valid_at to determine the correct base +register. Henceforth, any CFA offset should be thought of as logical +and not physical. */ + gcc_assert (m->fs.sp_realigned_offset >= m->fs.sp_realigned_fp_last); gcc_assert (m->fs.
[PATCH 2/6] [i386] Remove ix86_frame::outlined_save_offset
This value was used in an earlier incarnation of the -mcall-ms2sysv-xlogues patch set but is now set and never read. The value of ix86_frame::sse_reg_save_offset serves the same purpose. Signed-off-by: Daniel Santos --- gcc/config/i386/i386.c | 1 - gcc/config/i386/i386.h | 4 +--- 2 files changed, 1 insertion(+), 4 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 690631dfe43..47c5608c3cd 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12966,7 +12966,6 @@ ix86_compute_frame_layout (void) offset += xlogue.get_stack_space_used (); gcc_assert (!(offset & 0xf)); - frame->outlined_save_offset = offset; } /* Align and set SSE register save area. */ diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index ce5bb7f6677..1648bdf1556 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -2477,8 +2477,7 @@ enum avx_u128_state <- end of stub-saved/restored regs [padding1] ] - <- outlined_save_offset - <- sse_regs_save_offset + <- sse_reg_save_offset [padding2] |<- FRAME_POINTER [va_arg registers] | @@ -2504,7 +2503,6 @@ struct GTY(()) ix86_frame HOST_WIDE_INT reg_save_offset; HOST_WIDE_INT stack_realign_allocate_offset; HOST_WIDE_INT stack_realign_offset; - HOST_WIDE_INT outlined_save_offset; HOST_WIDE_INT sse_reg_save_offset; /* When save_regs_using_mov is set, emit prologue using -- 2.13.3
[PATCH 3/6] [i386] Remove machine_function::call_ms2sysv_pad_out
The -mcall-ms2sysv-xlogues project added the boolean fields call_ms2sysv_pad_in and call_ms2sysv_pad_out to struct machine_function to track rather or not an additional 8 bytes of padding was needed for stack alignment prior to and after the stub save area. This design was based upon the faulty assumption the function body would not require a stack alignment greater than 16 bytes. This continues to work well for managing padding prior to the stub save area, but will not work for the outgoing alignment. Rather than changing machine_function::call_ms2sysv_pad_out to a larger type, this patch removes it, thus transferring responsibility for stack alignment following the stub save area from class xlogue_layout to the body of ix86_compute_frame_layout. Since the 64-bit va_arg register save area is always a multiple of 16-bytes in size (176 for System V ABI and 96 for Microsoft ABI), the ROUND_UP calculation for the stack offset at the start of the function body (frame.frame_pointer_offset) will assure there is enough room for any padding needed to keep the save area for SSE va_args 16-byte aligned, so no modification is needed for that calculation. Signed-off-by: Daniel Santos --- gcc/config/i386/i386.c | 18 -- gcc/config/i386/i386.h | 8 ++-- 2 files changed, 6 insertions(+), 20 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 47c5608c3cd..e2e9546a27c 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2491,9 +2491,7 @@ public: unsigned last_reg = m->call_ms2sysv_extra_regs + MIN_REGS - 1; gcc_assert (m->call_ms2sysv_extra_regs <= MAX_EXTRA_REGS); -return m_regs[last_reg].offset - + (m->call_ms2sysv_pad_out ? 8 : 0) - + STUB_INDEX_OFFSET; +return m_regs[last_reg].offset + STUB_INDEX_OFFSET; } /* Returns the offset for the base pointer used by the stub. */ @@ -12849,13 +12847,12 @@ ix86_compute_frame_layout (void) { unsigned count = xlogue_layout::count_stub_managed_regs (); m->call_ms2sysv_extra_regs = count - xlogue_layout::MIN_REGS; + m->call_ms2sysv_pad_in = 0; } } frame->nregs = ix86_nsaved_regs (); frame->nsseregs = ix86_nsaved_sseregs (); - m->call_ms2sysv_pad_in = 0; - m->call_ms2sysv_pad_out = 0; /* 64-bit MS ABI seem to require stack alignment to be always 16, except for function prologues, leaf functions and when the defult @@ -12957,15 +12954,7 @@ ix86_compute_frame_layout (void) gcc_assert (!frame->nsseregs); m->call_ms2sysv_pad_in = !!(offset & UNITS_PER_WORD); - - /* Select an appropriate layout for incoming stack offset. */ - const struct xlogue_layout &xlogue = xlogue_layout::get_instance (); - - if ((offset + xlogue.get_stack_space_used ()) & UNITS_PER_WORD) - m->call_ms2sysv_pad_out = 1; - - offset += xlogue.get_stack_space_used (); - gcc_assert (!(offset & 0xf)); + offset += xlogue_layout::get_instance ().get_stack_space_used (); } /* Align and set SSE register save area. */ @@ -12993,6 +12982,7 @@ ix86_compute_frame_layout (void) /* Align start of frame for local function. */ if (stack_realign_fp + || m->call_ms2sysv || offset != frame->sse_reg_save_offset || size != 0 || !crtl->is_leaf diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 1648bdf1556..b08e45f68d4 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -2646,17 +2646,13 @@ struct GTY(()) machine_function { BOOL_BITFIELD arg_reg_available : 1; /* If true, we're out-of-lining reg save/restore for regs clobbered - by ms_abi functions calling a sysv function. */ + by 64-bit ms_abi functions calling a sysv_abi function. */ BOOL_BITFIELD call_ms2sysv : 1; /* If true, the incoming 16-byte aligned stack has an offset (of 8) and - needs padding. */ + needs padding prior to out-of-line stub save/restore area. */ BOOL_BITFIELD call_ms2sysv_pad_in : 1; - /* If true, the size of the stub save area plus inline int reg saves will - result in an 8 byte offset, so needs padding. */ - BOOL_BITFIELD call_ms2sysv_pad_out : 1; - /* This is the number of extra registers saved by stub (valid range is 0-6). Each additional register is only saved/restored by the stubs if all successive ones are. (Will always be zero when using a hard -- 2.13.3
[PATCH 4/6] [i386] Modify ix86_compute_frame_layout
These changes affect how the stack frame is calculated from the region starting at frame.reg_save_offset until frame.frame_pointer_offset, which includes either the stub save area or the (inline) SSE register save area and the va_args register save area. The calculation used when not realigning the stack pointer is the same, but when when realigning we calculate the 16-byte aligned space needed in reverse so that the stack realignment boundary at frame.stack_realign_offset may not necessarily be a multiple of stack_alignment_needed, but the value of frame.frame_pointer_offset will. This results in a properly aligned stack for the function body and avoids wasting stack space. Signed-off-by: Daniel Santos --- gcc/config/i386/i386.c | 116 + gcc/config/i386/i386.h | 2 +- 2 files changed, 80 insertions(+), 38 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index e2e9546a27c..e92f322de0c 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12874,6 +12874,14 @@ ix86_compute_frame_layout (void) gcc_assert (preferred_alignment >= STACK_BOUNDARY / BITS_PER_UNIT); gcc_assert (preferred_alignment <= stack_alignment_needed); + /* The only ABI saving SSE regs should be 64-bit ms_abi. */ + gcc_assert (TARGET_64BIT || !frame->nsseregs); + if (TARGET_64BIT && m->call_ms2sysv) +{ + gcc_assert (stack_alignment_needed >= 16); + gcc_assert (!frame->nsseregs); +} + /* For SEH we have to limit the amount of code movement into the prologue. At present we do this via a BLOCKAGE, at which point there's very little scheduling that can be done, which means that there's very little point @@ -12936,54 +12944,88 @@ ix86_compute_frame_layout (void) if (TARGET_SEH) frame->hard_frame_pointer_offset = offset; - /* When re-aligning the stack frame, but not saving SSE registers, this - is the offset we want adjust the stack pointer to. */ - frame->stack_realign_allocate_offset = offset; + /* Calculate the size of the va-arg area (not including padding, if any). */ + frame->va_arg_size = ix86_varargs_gpr_size + ix86_varargs_fpr_size; - /* The re-aligned stack starts here. Values before this point are not - directly comparable with values below this point. Use sp_valid_at - to determine if the stack pointer is valid for a given offset and - fp_valid_at for the frame pointer. */ if (stack_realign_fp) -offset = ROUND_UP (offset, stack_alignment_needed); - frame->stack_realign_offset = offset; - - if (TARGET_64BIT && m->call_ms2sysv) { - gcc_assert (stack_alignment_needed >= 16); - gcc_assert (!frame->nsseregs); + /* We may need a 16-byte aligned stack for the remainder of the +register save area, but the stack frame for the local function +may require a greater alignment if using AVX/2/512. In order +to avoid wasting space, we first calculate the space needed for +the rest of the register saves, add that to the stack pointer, +and then realign the stack to the boundary of the start of the +frame for the local function. */ + HOST_WIDE_INT space_needed = 0; + HOST_WIDE_INT sse_reg_space_needed = 0; - m->call_ms2sysv_pad_in = !!(offset & UNITS_PER_WORD); - offset += xlogue_layout::get_instance ().get_stack_space_used (); -} + if (TARGET_64BIT) + { + if (m->call_ms2sysv) + { + m->call_ms2sysv_pad_in = 0; + space_needed = xlogue_layout::get_instance ().get_stack_space_used (); + } - /* Align and set SSE register save area. */ - else if (frame->nsseregs) -{ - /* The only ABI that has saved SSE registers (Win64) also has a -16-byte aligned default stack. However, many programs violate -the ABI, and Wine64 forces stack realignment to compensate. + else if (frame->nsseregs) + /* The only ABI that has saved SSE registers (Win64) also has a + 16-byte aligned default stack. However, many programs violate + the ABI, and Wine64 forces stack realignment to compensate. */ + space_needed = frame->nsseregs * 16; + + sse_reg_space_needed = space_needed = ROUND_UP (space_needed, 16); + + /* 64-bit frame->va_arg_size should always be a multiple of 16, but +rounding to be pedantic. */ + space_needed = ROUND_UP (space_needed + frame->va_arg_size, 16); + } + else + space_needed = frame->va_arg_size; + + /* Record the allocation size required prior to the realignment AND. */ + frame->stack_realign_allocate = space_needed; + + /* The re-aligned stack starts at frame->stack_realign_offset. Values +before this point are not directly comparable with values below
[PATCH 5/6] [i386] Modify SP realignment in ix86_expand_prologue, et. al.
The SP allocation calculation is now done in ix86_compute_frame_layout and the result stored in ix86_frame::stack_realign_allocate. This change also updates comments for choose_baseaddr to clarify that the alignment returned doesn't necessarily reflect the alignment of the cfa_offset passed (e.g., you can pass cfa_offset 48 and it can return an alignment of 64 bytes). Since the alignment required may be more than 16-bytes, we cannot defer SP allocation to ix86_emit_outlined_ms2sysv_save (when it's enabled), so that function needs to be updated as well. Signed-off-by: Daniel Santos --- gcc/config/i386/i386.c | 54 +++--- 1 file changed, 29 insertions(+), 25 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index e92f322de0c..7e1fc4dfbf5 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -13273,10 +13273,13 @@ choose_basereg (HOST_WIDE_INT cfa_offset, rtx &base_reg, } /* Return an RTX that points to CFA_OFFSET within the stack frame and - the alignment of address. If align is non-null, it should point to + the alignment of address. If ALIGN is non-null, it should point to an alignment value (in bits) that is preferred or zero and will - recieve the alignment of the base register that was selected. The - valid base registers are taken from CFUN->MACHINE->FS. */ + recieve the alignment of the base register that was selected, + irrespective of rather or not CFA_OFFSET is a multiple of that + alignment value. + + The valid base registers are taken from CFUN->MACHINE->FS. */ static rtx choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align) @@ -14322,35 +14325,35 @@ ix86_emit_outlined_ms2sysv_save (const struct ix86_frame &frame) rtx sym, addr; rtx rax = gen_rtx_REG (word_mode, AX_REG); const struct xlogue_layout &xlogue = xlogue_layout::get_instance (); - HOST_WIDE_INT rax_offset = xlogue.get_stub_ptr_offset () + m->fs.sp_offset; - HOST_WIDE_INT stack_alloc_size = frame.stack_pointer_offset - m->fs.sp_offset; - HOST_WIDE_INT stack_align_off_in = xlogue.get_stack_align_off_in (); + HOST_WIDE_INT allocate = frame.stack_pointer_offset - m->fs.sp_offset; + + /* AL should only be live with sysv_abi. */ + gcc_assert (!ix86_eax_live_at_start_p ()); + + /* Setup RAX as the stub's base pointer. We use stack_realign_offset rather + we've actually realigned the stack or not. */ + align = GET_MODE_ALIGNMENT (V4SFmode); + addr = choose_baseaddr (frame.stack_realign_offset + + xlogue.get_stub_ptr_offset (), &align); + gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode)); + emit_insn (gen_rtx_SET (rax, addr)); - /* Verify that the incoming stack 16-byte alignment offset matches the - layout we're using. */ - gcc_assert (stack_align_off_in == (m->fs.sp_offset & UNITS_PER_WORD)); + /* Allocate stack if not already done. */ + if (allocate > 0) + pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, + GEN_INT (-allocate), -1, false); /* Get the stub symbol. */ sym = xlogue.get_stub_rtx (frame_pointer_needed ? XLOGUE_STUB_SAVE_HFP : XLOGUE_STUB_SAVE); RTVEC_ELT (v, vi++) = gen_rtx_USE (VOIDmode, sym); - /* Setup RAX as the stub's base pointer. */ - align = GET_MODE_ALIGNMENT (V4SFmode); - addr = choose_baseaddr (rax_offset, &align); - gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode)); - insn = emit_insn (gen_rtx_SET (rax, addr)); - - gcc_assert (stack_alloc_size >= xlogue.get_stack_space_used ()); - pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, -GEN_INT (-stack_alloc_size), -1, -m->fs.cfa_reg == stack_pointer_rtx); for (i = 0; i < ncregs; ++i) { const xlogue_layout::reginfo &r = xlogue.get_reginfo (i); rtx reg = gen_rtx_REG ((SSE_REGNO_P (r.regno) ? V4SFmode : word_mode), r.regno); - RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);; + RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset); } gcc_assert (vi == (unsigned)GET_NUM_ELEM (v)); @@ -14608,8 +14611,8 @@ ix86_expand_prologue (void) that we must allocate the size of the register save area before performing the actual alignment. Otherwise we cannot guarantee that there's enough storage above the realignment point. */ - allocate = frame.stack_realign_allocate_offset - m->fs.sp_offset; - if (allocate && !m->call_ms2sysv) + allocate = frame.stack_realign_allocate; + if (allocate) pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, GEN_INT (-allocate), -1,
[PATCH 6/6] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available
The testcase in the PR is used as a base and relevant variants are added to test other factors affected by the patch set. pr80969-1.c Base test case. pr80969-2.c With ms to sysv call. pr80969-2a.c With ms to sysv call using stubs. pr80969-3.c With alloca (for DRAP test). pr80969-4.c With va_args passed via va_list pr80969-4a.c With va_args passed via va_list and ms to sysv call. pr80969-4b.c With va_args passed via va_list and ms to sysv call using stubs. Signed-off-by: Daniel Santos --- gcc/testsuite/gcc.target/i386/pr80969-1.c | 16 gcc/testsuite/gcc.target/i386/pr80969-2.c | 26 ++ gcc/testsuite/gcc.target/i386/pr80969-2a.c | 26 ++ gcc/testsuite/gcc.target/i386/pr80969-3.c | 31 gcc/testsuite/gcc.target/i386/pr80969-4.c | 123 gcc/testsuite/gcc.target/i386/pr80969-4a.c | 124 + gcc/testsuite/gcc.target/i386/pr80969-4b.c | 124 + gcc/testsuite/lib/target-supports.exp | 66 +++ 8 files changed, 536 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4b.c diff --git a/gcc/testsuite/gcc.target/i386/pr80969-1.c b/gcc/testsuite/gcc.target/i386/pr80969-1.c new file mode 100644 index 000..eb8d767a778 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80969-1.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-Ofast -mabi=ms -mavx512f" } */ +/* { dg-require-effective-target avx512f } */ + +int a[56]; +int b; +int main (int argc, char *argv[]) { + int c; + for (; b; b++) { +c = b; +if (b & 1) + c = 2; +a[b] = c; + } + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2.c b/gcc/testsuite/gcc.target/i386/pr80969-2.c new file mode 100644 index 000..e868d6c7e5c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80969-2.c @@ -0,0 +1,26 @@ +/* { dg-do run } */ +/* { dg-options "-Ofast -mabi=ms -mavx512f" } */ +/* { dg-require-effective-target avx512f } */ + +/* Test when calling a sysv func. */ + +int a[56]; +int b; + +static void __attribute__((sysv_abi)) sysv () +{ +} + +void __attribute__((sysv_abi)) (*volatile const sysv_noinfo)() = sysv; + +int main (int argc, char *argv[]) { + int c; + sysv_noinfo (); + for (; b; b++) { +c = b; +if (b & 1) + c = 2; +a[b] = c; + } + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2a.c b/gcc/testsuite/gcc.target/i386/pr80969-2a.c new file mode 100644 index 000..071a90534a4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80969-2a.c @@ -0,0 +1,26 @@ +/* { dg-do run } */ +/* { dg-options "-Ofast -mabi=ms -mavx512f -mcall-ms2sysv-xlogues" } */ +/* { dg-require-effective-target avx512f } */ + +/* Test when calling a sysv func using save/restore stubs. */ + +int a[56]; +int b; + +static void __attribute__((sysv_abi)) sysv () +{ +} + +void __attribute__((sysv_abi)) (*volatile const sysv_noinfo)() = sysv; + +int main (int argc, char *argv[]) { + int c; + sysv_noinfo (); + for (; b; b++) { +c = b; +if (b & 1) + c = 2; +a[b] = c; + } + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr80969-3.c b/gcc/testsuite/gcc.target/i386/pr80969-3.c new file mode 100644 index 000..5982981b55c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80969-3.c @@ -0,0 +1,31 @@ +/* { dg-do run } */ +/* { dg-options "-Ofast -mabi=ms -mavx512f" } */ +/* { dg-require-effective-target avx512f } */ + +/* Test with alloca (and DRAP). */ + +#include + +int a[56]; +volatile int b = -12345; +volatile const int d = 42; + +void foo (int *x, int y, int z) +{ +} + +void (*volatile const foo_noinfo)(int *, int, int) = foo; + +int main (int argc, char *argv[]) { + int c; + int *e = alloca (d); + foo_noinfo (e, d, 0); + for (; b; b++) { +c = b; +if (b & 1) + c = 2; +foo_noinfo (e, d, c); +a[-(b % 56)] = c; + } + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr80969-4.c b/gcc/testsuite/gcc.target/i386/pr80969-4.c new file mode 100644 index 000..1ec54d081cd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80969-4.c @@ -0,0 +1,123 @@ +/* { dg-do run { target avx512f_runtime } } */ +/* { dg-options "-Ofast -mabi=ms -mavx512f" } */ +/* { dg-require-effective-target avx512f } */ + +/* Test with avx512 and va_args. */ + +#include +#include + +#include "avx-check.h" + +int a[56]; +int b; + +__m128 n1 = { -283.3, -23.3, 213.4, 1119.03 }; +__m512d n2 = { -93.83, 893.318, 3994.3, -39484.0, 830.32, -328.32, 3
Re: [PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f
Well I just learned how to test 32-bit earlier and I've uncovered a problem when running 32-bit tests. Do you want me to commit the the two patches (squashed together) in the mean time? Thanks, Daniel
[PATCH 5/6 v2] [i386] Modify SP realignment in ix86_expand_prologue, et. al.
My first version of this patch inited m->fs.sp_realigned_fp_last with the value of m->fs.sp_offset prior to performing the stack realignment. I had forgotten, however, that when we're saving GP regs using MOV that we delay SP modification as long as possible so that the value of m->fs.sp_offset at this point is correct when we've used push, but incorrect when we've used mov. This time I've bootstraped with --enable-checking=yes,rtl --enable-languages=all and reg tested using the below command to test both 64- and 32-bit code. make -kj8 RUNTESTFLAGS="--target_board=unix/\{,-m32\}" check Original patch description: The SP allocation calculation is now done in ix86_compute_frame_layout and the result stored in ix86_frame::stack_realign_allocate. This change also updates comments for choose_baseaddr to clarify that the alignment returned doesn't necessarily reflect the alignment of the cfa_offset passed (e.g., you can pass cfa_offset 48 and it can return an alignment of 64 bytes). Since the alignment required may be more than 16-bytes, we cannot defer SP allocation to ix86_emit_outlined_ms2sysv_save (when it's enabled), so that function needs to be updated as well. Signed-off-by: Daniel Santos --- gcc/config/i386/i386.c | 58 -- 1 file changed, 32 insertions(+), 26 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 0dc366cf16e..a1f39cd714c 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -13289,10 +13289,13 @@ choose_basereg (HOST_WIDE_INT cfa_offset, rtx &base_reg, } /* Return an RTX that points to CFA_OFFSET within the stack frame and - the alignment of address. If align is non-null, it should point to + the alignment of address. If ALIGN is non-null, it should point to an alignment value (in bits) that is preferred or zero and will - recieve the alignment of the base register that was selected. The - valid base registers are taken from CFUN->MACHINE->FS. */ + recieve the alignment of the base register that was selected, + irrespective of rather or not CFA_OFFSET is a multiple of that + alignment value. + + The valid base registers are taken from CFUN->MACHINE->FS. */ static rtx choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align) @@ -14338,35 +14341,35 @@ ix86_emit_outlined_ms2sysv_save (const struct ix86_frame &frame) rtx sym, addr; rtx rax = gen_rtx_REG (word_mode, AX_REG); const struct xlogue_layout &xlogue = xlogue_layout::get_instance (); - HOST_WIDE_INT rax_offset = xlogue.get_stub_ptr_offset () + m->fs.sp_offset; - HOST_WIDE_INT stack_alloc_size = frame.stack_pointer_offset - m->fs.sp_offset; - HOST_WIDE_INT stack_align_off_in = xlogue.get_stack_align_off_in (); + HOST_WIDE_INT allocate = frame.stack_pointer_offset - m->fs.sp_offset; + + /* AL should only be live with sysv_abi. */ + gcc_assert (!ix86_eax_live_at_start_p ()); + + /* Setup RAX as the stub's base pointer. We use stack_realign_offset rather + we've actually realigned the stack or not. */ + align = GET_MODE_ALIGNMENT (V4SFmode); + addr = choose_baseaddr (frame.stack_realign_offset + + xlogue.get_stub_ptr_offset (), &align); + gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode)); + emit_insn (gen_rtx_SET (rax, addr)); - /* Verify that the incoming stack 16-byte alignment offset matches the - layout we're using. */ - gcc_assert (stack_align_off_in == (m->fs.sp_offset & UNITS_PER_WORD)); + /* Allocate stack if not already done. */ + if (allocate > 0) + pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, + GEN_INT (-allocate), -1, false); /* Get the stub symbol. */ sym = xlogue.get_stub_rtx (frame_pointer_needed ? XLOGUE_STUB_SAVE_HFP : XLOGUE_STUB_SAVE); RTVEC_ELT (v, vi++) = gen_rtx_USE (VOIDmode, sym); - /* Setup RAX as the stub's base pointer. */ - align = GET_MODE_ALIGNMENT (V4SFmode); - addr = choose_baseaddr (rax_offset, &align); - gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode)); - insn = emit_insn (gen_rtx_SET (rax, addr)); - - gcc_assert (stack_alloc_size >= xlogue.get_stack_space_used ()); - pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, -GEN_INT (-stack_alloc_size), -1, -m->fs.cfa_reg == stack_pointer_rtx); for (i = 0; i < ncregs; ++i) { const xlogue_layout::reginfo &r = xlogue.get_reginfo (i); rtx reg = gen_rtx_REG ((SSE_REGNO_P (r.regno) ? V4SFmode : word_mode), r.regno); - RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);; + RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset); } gcc_assert (v
[PATCH 6/6 v2] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available
This update adds documentation for the new effective taregts in addition to a few existing effective targets that were undocumented. Changes to lib/target-supports.exp and documentation: * Add effective-targets avx512f and avx512f_runtime (needed for new tests). * Corrects bug in check_avx2_hw_available. * Adds documentation for effective-targets avx2, avx2_runtime (both missing), avx512f and avx512f_runtime. The following tests are added. The testcase in the PR is used as a base and relevant variants are added to test other factors affected by the patch set. pr80969-1.c Base test case. pr80969-2.c With ms to sysv call. pr80969-2a.c With ms to sysv call using stubs. pr80969-3.c With alloca (for DRAP test). pr80969-4.c With va_args passed via va_list pr80969-4a.c With va_args passed via va_list and ms to sysv call. pr80969-4b.c With va_args passed via va_list and ms to sysv call using stubs. Signed-off-by: Daniel Santos --- gcc/doc/sourcebuild.texi | 12 +++ gcc/testsuite/gcc.target/i386/pr80969-1.c | 16 gcc/testsuite/gcc.target/i386/pr80969-2.c | 26 ++ gcc/testsuite/gcc.target/i386/pr80969-2a.c | 26 ++ gcc/testsuite/gcc.target/i386/pr80969-3.c | 31 gcc/testsuite/gcc.target/i386/pr80969-4.c | 123 gcc/testsuite/gcc.target/i386/pr80969-4a.c | 124 + gcc/testsuite/gcc.target/i386/pr80969-4b.c | 124 + gcc/testsuite/lib/target-supports.exp | 66 +++ 9 files changed, 548 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4b.c diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index 85af8778167..66f040f212d 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -1852,6 +1852,18 @@ Target supports compiling @code{avx} instructions. @item avx_runtime Target supports the execution of @code{avx} instructions. +@item avx2 +Target supports compiling @code{avx2} instructions. + +@item avx2_runtime +Target supports the execution of @code{avx2} instructions. + +@item avx512f +Target supports compiling @code{avx512f} instructions. + +@item avx512f_runtime +Target supports the execution of @code{avx512f} instructions. + @item cell_hw Test system can execute AltiVec and Cell PPU instructions. diff --git a/gcc/testsuite/gcc.target/i386/pr80969-1.c b/gcc/testsuite/gcc.target/i386/pr80969-1.c new file mode 100644 index 000..eb8d767a778 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80969-1.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-Ofast -mabi=ms -mavx512f" } */ +/* { dg-require-effective-target avx512f } */ + +int a[56]; +int b; +int main (int argc, char *argv[]) { + int c; + for (; b; b++) { +c = b; +if (b & 1) + c = 2; +a[b] = c; + } + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2.c b/gcc/testsuite/gcc.target/i386/pr80969-2.c new file mode 100644 index 000..e868d6c7e5c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80969-2.c @@ -0,0 +1,26 @@ +/* { dg-do run } */ +/* { dg-options "-Ofast -mabi=ms -mavx512f" } */ +/* { dg-require-effective-target avx512f } */ + +/* Test when calling a sysv func. */ + +int a[56]; +int b; + +static void __attribute__((sysv_abi)) sysv () +{ +} + +void __attribute__((sysv_abi)) (*volatile const sysv_noinfo)() = sysv; + +int main (int argc, char *argv[]) { + int c; + sysv_noinfo (); + for (; b; b++) { +c = b; +if (b & 1) + c = 2; +a[b] = c; + } + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2a.c b/gcc/testsuite/gcc.target/i386/pr80969-2a.c new file mode 100644 index 000..071a90534a4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80969-2a.c @@ -0,0 +1,26 @@ +/* { dg-do run } */ +/* { dg-options "-Ofast -mabi=ms -mavx512f -mcall-ms2sysv-xlogues" } */ +/* { dg-require-effective-target avx512f } */ + +/* Test when calling a sysv func using save/restore stubs. */ + +int a[56]; +int b; + +static void __attribute__((sysv_abi)) sysv () +{ +} + +void __attribute__((sysv_abi)) (*volatile const sysv_noinfo)() = sysv; + +int main (int argc, char *argv[]) { + int c; + sysv_noinfo (); + for (; b; b++) { +c = b; +if (b & 1) + c = 2; +a[b] = c; + } + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr80969-3.c b/gcc/testsuite/gcc.target/i386/pr80969-3.c new file mode 100644 index 000..5982981b55c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80969-3.c @@ -0,0 +1,31 @@ +/*
PING Re: [PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f
Original message: https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02005.html Patches 2 and 3 have been committed and I have corrected the error in patch 5. I configuring with --enable-checking=yes,rtl --enable-languages=all and retested with RUNTESTFLAGS="--target_board=unix/\{,-m32\}" The updated patches fix an error when using mov instead of push and add documentation for changes to target-supports.exp. I have included modified ChangeLogs. In addition to to fixing the ICE, this patch set makes more efficient use of stack space in some cases the outgoing stack boundary is > 16 bytes and realignment is necessary. This adds new tests, some of which require avx512f (gcc/testsuite/gcc.target/i386/pr80969-4*.c) -- these I have only tested these using Intel SDE. Below is an updated list of the patches. 1. https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02006.html 2. Committed. 3. Committed. 4. https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02009.html 5. v2 -- https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00249.html 6. v2 -- https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00618.html Thanks, Daniel 2017-08-08 Daniel Santos * config/i386/i386.h (ix86_frame::stack_realign_allocate_offset): Remove (ix86_frame::stack_realign_allocate): New field. (struct machine_frame_state): Modify comments. (machine_frame_state::sp_realigned_fp_end): New field. * config/i386/i386.c (ix86_compute_frame_layout): Modify. (sp_valid_at): Likewise. (fp_valid_at): Likewise. (choose_baseaddr): Modify comments. (ix86_emit_outlined_ms2sysv_save): Modify. (ix86_expand_prologue): Likewise. * doc/sourcebuild.texi (avx2, avx2_runtime): Add missing items to effective-targets. (avx512f, avx512f_runtime): Add new items to effective-tarets. 2017-08-08 Daniel Santos * lib/target-supports.exp (check_avx512_os_support_available): New Procedure. (check_avx2_hw_available): Modify. (check_avx512f_hw_available): New Procedure. (check_effective_target_avx512f_runtime): Likewise. * gcc.target/i386/pr80969-1.c: New testcase. * gcc.target/i386/pr80969-2a.c: Likewise. * gcc.target/i386/pr80969-2.c: Likewise. * gcc.target/i386/pr80969-3.c: Likewise. * gcc.target/i386/pr80969-4a.c: Likewise. * gcc.target/i386/pr80969-4b.c: Likewise. * gcc.target/i386/pr80969-4.c: Likewise.
[PATCH] [i386,testsuite] [PR 71958] Error on -mx32 with -mabi=ms
We currently error when -mx32 -mabi=sysv and we encounter a function with attribute ms_abi, but we are not erroring on -mx32 and -mabi=ms (either explicitly or when it is the default on Windows). In fact, it generates code that runs, but is of an undfined ABI. I'm running -m64 and -m32 tests now and will run x32 tests when those are done. Presuming that I've corrected all existing tests that do not filter out x32 target and there are no additional failures, is this OK for head? Thanks, Daniel gcc/ChangeLog: 2017-08-11 Daniel Santos * config/i386/i386.c (ix86_option_override_internal): Modify. (ix86_function_type_abi): Likewise. gcc/testsuite/ChangeLog: 2017-08-11 Daniel Santos * gcc.target/i386/pr71958.c: New test. * gcc.target/i386/pr64409.c: Modify to skip on Windows. * gcc.target/i386/pr46470.c: Modify to skip x32 target. * gcc.target/i386/pr66275.c: Likewise. * gcc.target/i386/pr68018.c: Likewise. Signed-off-by: Daniel Santos --- gcc/config/i386/i386.c | 11 +-- gcc/testsuite/gcc.target/i386/pr46470.c | 2 +- gcc/testsuite/gcc.target/i386/pr64409.c | 3 ++- gcc/testsuite/gcc.target/i386/pr66275.c | 2 +- gcc/testsuite/gcc.target/i386/pr68018.c | 2 +- gcc/testsuite/gcc.target/i386/pr71958.c | 8 6 files changed, 22 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr71958.c diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index b04321a8d40..311a52c2a1f 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -5585,6 +5585,9 @@ ix86_option_override_internal (bool main_args_p, if (TARGET_X32_P (opts->x_ix86_isa_flags)) { + if (opts_set->x_ix86_abi == MS_ABI) + error ("-mx32 not supported with -mabi=ms"); + /* Always turn on OPTION_MASK_ISA_64BIT and turn off OPTION_MASK_ABI_64 for TARGET_X32. */ opts->x_ix86_isa_flags |= OPTION_MASK_ISA_64BIT; @@ -8777,8 +8780,12 @@ ix86_function_type_abi (const_tree fntype) if (abi == SYSV_ABI && lookup_attribute ("ms_abi", TYPE_ATTRIBUTES (fntype))) { - if (TARGET_X32) - error ("X32 does not support ms_abi attribute"); + static int warned; + if (TARGET_X32 && !warned) + { + error ("X32 does not support ms_abi attribute"); + warned = 1; + } abi = MS_ABI; } diff --git a/gcc/testsuite/gcc.target/i386/pr46470.c b/gcc/testsuite/gcc.target/i386/pr46470.c index 9e8e731188e..c66a378a1ad 100644 --- a/gcc/testsuite/gcc.target/i386/pr46470.c +++ b/gcc/testsuite/gcc.target/i386/pr46470.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile { target { ! x32 } } } */ /* The pic register save adds unavoidable stack pointer references. */ /* { dg-skip-if "" { ia32 && { ! nonpic } } } */ /* These options are selected to ensure 1 word needs to be allocated diff --git a/gcc/testsuite/gcc.target/i386/pr64409.c b/gcc/testsuite/gcc.target/i386/pr64409.c index 917472653f4..3dbd9a09f01 100644 --- a/gcc/testsuite/gcc.target/i386/pr64409.c +++ b/gcc/testsuite/gcc.target/i386/pr64409.c @@ -1,6 +1,7 @@ /* { dg-do compile { target { ! ia32 } } } */ /* { dg-require-effective-target maybe_x32 } */ /* { dg-options "-O0 -mx32" } */ +/* { xfail { "*-*-cygwin* *-*-mingw*" } } */ int a; -int* __attribute__ ((ms_abi)) fn1 () { return &a; } /* { dg-error "X32 does not support ms_abi attribute" } */ +int* __attribute__ ((ms_abi)) fn1 () { return &a; } /* { dg-error "X32 does not support ms_abi attribute" { target { ! "*-*-mingw* *-*-cygwin*" } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr66275.c b/gcc/testsuite/gcc.target/i386/pr66275.c index b8759aeb5ec..a1271857f6a 100644 --- a/gcc/testsuite/gcc.target/i386/pr66275.c +++ b/gcc/testsuite/gcc.target/i386/pr66275.c @@ -1,4 +1,4 @@ -/* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */ +/* { dg-do compile { target { *-*-linux* && { ! { ia32 || x32 } } } } } */ /* { dg-options "-mabi=ms -fdump-rtl-dfinit" } */ void diff --git a/gcc/testsuite/gcc.target/i386/pr68018.c b/gcc/testsuite/gcc.target/i386/pr68018.c index a0fa21e0b00..871fdddf643 100644 --- a/gcc/testsuite/gcc.target/i386/pr68018.c +++ b/gcc/testsuite/gcc.target/i386/pr68018.c @@ -1,4 +1,4 @@ -/* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */ +/* { dg-do compile { target { *-*-linux* && { ! { ia32 || x32 } } } } } */ /* { dg-options "-O -mabi=ms -mstackrealign" } */ typedef float V __attribute__ ((vector_size (16))); diff --git a/gcc/testsuite/gcc.target/i386/pr71958.c b/gcc/testsuite/gcc.target/i386/pr71958.c new file mode 100644 index 000..090d1970ca9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr71958.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-mx32 -mabi=ms" } */ +/* { dg-require-effective-target maybe_x32 } */ +/* { dg-excess-errors "not supported" } */ + +void main () +{ +} -- 2.13.3
[PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS
It took me a while to figure out how to do this so I figured that it should be in the docs. OK for trunk? * doc/install.texi: Add more details on selecting multiple tests. Thanks, Daniel Signed-off-by: Daniel Santos --- gcc/doc/install.texi | 10 ++ 1 file changed, 10 insertions(+) diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index 7c9e2f25d44..6aefd213901 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -2737,6 +2737,16 @@ the testsuite with filenames matching @samp{9805*}, you would use make check-g++ RUNTESTFLAGS="old-deja.exp=9805* @var{other-options}" @end smallexample +The file-matching expression following @var{filename}@command{.exp=} is treated +as a series of whitespace-delimited glob expressions so that multiple patterns +may be passed, although any whitespace must either be escaped or surrounded by +tick marks if multiple expressions are desired. For example, + +@smallexample +make check-g++ RUNTESTFLAGS="old-deja.exp=9805*\ virtual2.c @var{other-options}" +make check-g++ RUNTESTFLAGS="'old-deja.exp=9805* virtual2.c' @var{other-options}" +@end smallexample + The @file{*.exp} files are located in the testsuite directories of the GCC source, the most important ones being @file{compile.exp}, @file{execute.exp}, @file{dg.exp} and @file{old-deja.exp}. -- 2.13.3
[PATCH] [i386, testsuite] [PR 71958] Error on -mx32 with -mabi=ms
We currently error when -mx32 and -mabi=sysv and we encounter a function with attribute ms_abi, but we are not erroring on -mx32 and -mabi=ms (either explicitly or when it is the default on Windows). In fact, it generates code that runs, but is of an undfined ABI. I'm also changing pr64409.c because if you explicitly supply -m64, then the test became ineffective. This is because the -mx32 parameter passed in dg-options is later overridden by the explicit -m64 parameter. I've bootstrapped and tested on * an x86_64-pc-linux-gnux32 system building gcc with --with-abi=mx32, * a "normal" x86_64-pc-linux-gnu testing with --target_board=unix/\{,-m32\}, and * on Windows. OK for trunk? gcc/ChangeLog: 2017-08-11 Daniel Santos * config/i386/i386.c (ix86_option_override_internal): Error when -mx32 is combined with -mabi=ms. (ix86_function_type_abi): Limit errors for mixing -mx32 with attribute ms_abi. gcc/testsuite/ChangeLog: 2017-08-11 Daniel Santos * gcc.target/i386/pr71958.c: New test to verify error on -mx32 and -mabi=ms * gcc.target/i386/pr64409.c: Modify to only run on x32. * gcc.target/i386/pr46470.c: Modify to skip x32 target. * gcc.target/i386/pr66275.c: Likewise. * gcc.target/i386/pr68018.c: Likewise. Thanks, Daniel Signed-off-by: Daniel Santos --- gcc/config/i386/i386.c | 12 ++-- gcc/testsuite/gcc.target/i386/pr46470.c | 2 +- gcc/testsuite/gcc.target/i386/pr64409.c | 2 +- gcc/testsuite/gcc.target/i386/pr66275.c | 2 +- gcc/testsuite/gcc.target/i386/pr68018.c | 2 +- gcc/testsuite/gcc.target/i386/pr71958.c | 7 +++ 6 files changed, 21 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr71958.c diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 1d88e4f247a..3b537f2608f 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -5684,6 +5684,10 @@ ix86_option_override_internal (bool main_args_p, if (!opts_set->x_ix86_abi) opts->x_ix86_abi = DEFAULT_ABI; + if (opts->x_ix86_abi == MS_ABI && TARGET_X32_P (opts->x_ix86_isa_flags)) +error ("-mabi=ms not supported with X32 ABI"); + gcc_assert (opts->x_ix86_abi == SYSV_ABI || opts->x_ix86_abi == MS_ABI); + /* For targets using ms ABI enable ms-extensions, if not explicit turned off. For non-ms ABI we turn off this option. */ @@ -8777,8 +8781,12 @@ ix86_function_type_abi (const_tree fntype) if (abi == SYSV_ABI && lookup_attribute ("ms_abi", TYPE_ATTRIBUTES (fntype))) { - if (TARGET_X32) - error ("X32 does not support ms_abi attribute"); + static int warned; + if (TARGET_X32 && !warned) + { + error ("X32 does not support ms_abi attribute"); + warned = 1; + } abi = MS_ABI; } diff --git a/gcc/testsuite/gcc.target/i386/pr46470.c b/gcc/testsuite/gcc.target/i386/pr46470.c index 9e8e731188e..c66a378a1ad 100644 --- a/gcc/testsuite/gcc.target/i386/pr46470.c +++ b/gcc/testsuite/gcc.target/i386/pr46470.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile { target { ! x32 } } } */ /* The pic register save adds unavoidable stack pointer references. */ /* { dg-skip-if "" { ia32 && { ! nonpic } } } */ /* These options are selected to ensure 1 word needs to be allocated diff --git a/gcc/testsuite/gcc.target/i386/pr64409.c b/gcc/testsuite/gcc.target/i386/pr64409.c index 917472653f4..7bf9d1e398d 100644 --- a/gcc/testsuite/gcc.target/i386/pr64409.c +++ b/gcc/testsuite/gcc.target/i386/pr64409.c @@ -1,4 +1,4 @@ -/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-do compile { target x32 } } */ /* { dg-require-effective-target maybe_x32 } */ /* { dg-options "-O0 -mx32" } */ diff --git a/gcc/testsuite/gcc.target/i386/pr66275.c b/gcc/testsuite/gcc.target/i386/pr66275.c index b8759aeb5ec..51ae1f6859c 100644 --- a/gcc/testsuite/gcc.target/i386/pr66275.c +++ b/gcc/testsuite/gcc.target/i386/pr66275.c @@ -1,4 +1,4 @@ -/* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */ +/* { dg-do compile { target { *-*-linux* && lp64 } } } */ /* { dg-options "-mabi=ms -fdump-rtl-dfinit" } */ void diff --git a/gcc/testsuite/gcc.target/i386/pr68018.c b/gcc/testsuite/gcc.target/i386/pr68018.c index a0fa21e0b00..04929c6c13c 100644 --- a/gcc/testsuite/gcc.target/i386/pr68018.c +++ b/gcc/testsuite/gcc.target/i386/pr68018.c @@ -1,4 +1,4 @@ -/* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */ +/* { dg-do compile { target { *-*-linux* && lp64 } } } */ /* { dg-options "-O -mabi=ms -mstackrealign" } */ typedef float V __attribute__ ((vector_size (16))); diff --git a/gcc/testsuite/gcc.target/i386/pr71958.c b/gcc/testsuite/gcc.target/i386/pr71958.c new file mode 100644 index 00
[PATCH] [i386] PR 81850 Don't ignore -mabi=sysv on Cygwin/MinGW
This is a problem that occured because of this code in ix86_option_override_internal: if (!opts_set->x_ix86_abi) opts->x_ix86_abi = DEFAULT_ABI; I tested this along with my other patches. OK for trunk? * config/i386/i386-opts.h (enum calling_abi): Modify so that no legal values are equivalent to zero. Thanks, Daniel Signed-off-by: Daniel Santos --- gcc/config/i386/i386-opts.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h index 542cd0f3d67..8c2b5380e49 100644 --- a/gcc/config/i386/i386-opts.h +++ b/gcc/config/i386/i386-opts.h @@ -44,8 +44,8 @@ last_alg /* Available call abi. */ enum calling_abi { - SYSV_ABI = 0, - MS_ABI = 1 + SYSV_ABI = 1, + MS_ABI = 2 }; enum fpmath_unit -- 2.13.3
Re: [PATCH] [i386] PR 81850 Don't ignore -mabi=sysv on Cygwin/MinGW
On 08/22/2017 01:26 AM, Andreas Schwab wrote: > On Aug 21 2017, Daniel Santos wrote: > >> This is a problem that occured because of this code in >> ix86_option_override_internal: >> >> if (!opts_set->x_ix86_abi) >> opts->x_ix86_abi = DEFAULT_ABI; > Why is that a problem? Note opts_set vs opts. Just because the test !opts_set->x_ix86_abi will be true rather we supplied no -mabi parameter or we supplied -mabi=sysv. Daniel > Andreas.
Re: [PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS
On 08/22/2017 10:58 AM, Martin Sebor wrote: > On 08/21/2017 07:41 PM, Daniel Santos wrote: >> It took me a while to figure out how to do this so I figured that it >> should be >> in the docs. OK for trunk? >> >> * doc/install.texi: Add more details on selecting multiple tests. > > Thank you! It had taken me some time to figure this out. > >> +The file-matching expression following @var{filename}@command{.exp=} >> is treated >> +as a series of whitespace-delimited glob expressions so that >> multiple patterns >> +may be passed, although any whitespace must either be escaped or >> surrounded by >> +tick marks if multiple expressions are desired. For example, > > Do you mean single quotes? Yes. I guess I've heard the terms "tick marks" and "single quotes" used before. Perhaps using 'single quotes' would be a good way to express it (with the quotes). > I would suggest "escaped or quoted." > The whole argument to RUNTESTFLAGS can be quoted in either single > or double quotes and, AFAICT, so can the space-separated test > names within it. Well, mysteriously, double quotes do not work. So if I pass RUNTESTFLAGS='"i386.exp=pr80969-[12]*.c pr80969-4.c"' then the second pattern isn't used. I have NO idea what happens to it because it I pass RUNTESTFLAGS='i386.exp=pr80969-[12]*.c pr80969-4.c' then runtest properly demands that I tell it what in the hell pr80969-4.c is supposed to mean. As an experiment, I created a symlink named \"pr80969-4.c and using RUNTESTFLAGS='"i386.exp=pr80969-[12]*.c "pr80969-4.c' but it didn't pick it up. This is probably JAB (just another bug) in DejaGNU. Among the variations I've tried are enclosing the expressions in {braces}, \{escaped braces\} and comma-delimited \{escaped,braces\}, but none of these worked. Daniel > Martin >
[PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS
OK, how's this one? * doc/install.texi: Modify to add more details on running selected tests. Thanks, Daniel Signed-off-by: Daniel Santos --- gcc/doc/install.texi | 10 ++ 1 file changed, 10 insertions(+) diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index 7c9e2f25d44..da360da1c50 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -2737,6 +2737,16 @@ the testsuite with filenames matching @samp{9805*}, you would use make check-g++ RUNTESTFLAGS="old-deja.exp=9805* @var{other-options}" @end smallexample +The file-matching expression following @var{filename}@command{.exp=} is treated +as a series of whitespace-delimited glob expressions so that multiple patterns +may be passed, although any whitespace must either be escaped or surrounded by +single quotes if multiple expressions are desired. For example, + +@smallexample +make check-g++ RUNTESTFLAGS="old-deja.exp=9805*\ virtual2.c @var{other-options}" +make check-g++ RUNTESTFLAGS="'old-deja.exp=9805* virtual2.c' @var{other-options}" +@end smallexample + The @file{*.exp} files are located in the testsuite directories of the GCC source, the most important ones being @file{compile.exp}, @file{execute.exp}, @file{dg.exp} and @file{old-deja.exp}. -- 2.13.3
Re: [PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS
On 08/22/2017 12:32 PM, Mike Stump wrote: > On Aug 22, 2017, at 10:32 AM, Daniel Santos wrote: >>> I would suggest "escaped or quoted." >>> The whole argument to RUNTESTFLAGS can be quoted in either single >>> or double quotes and, AFAICT, so can the space-separated test >>> names within it. >> Well, mysteriously, double quotes do not work. > Did you try the obvious: > > "\"pdf pdf\" pdf" > > ? I think it should work fine. Yes. As I explained in the rest of my email I tried a great many variations. I can debug runtest some more and try to better understand how this is getting parsed. Daniel
Re: [PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS
On 08/22/2017 12:32 PM, Mike Stump wrote: > On Aug 22, 2017, at 10:32 AM, Daniel Santos wrote: >>> I would suggest "escaped or quoted." >>> The whole argument to RUNTESTFLAGS can be quoted in either single >>> or double quotes and, AFAICT, so can the space-separated test >>> names within it. >> Well, mysteriously, double quotes do not work. > Did you try the obvious: > > "\"pdf pdf\" pdf" > > ? I think it should work fine. I have found one additional working mechanism: RUNTESTFLAGS='i386.exp=\"pr80969-[12]*.c pr80969-4.c\"' But using double quotes for both does NOT work: RUNTESTFLAGS="i386.exp=\"pr80969-[12]*.c pr80969-4.c\"" So the three working options appears to be: 1. Escaping whitespace 2. Using double quotes for the whole value and single quotes for the file.exp=patterns expression 3. Using single quotes for the whole value and double quotes for the file.exp=patterns expression Daniel
Re: [PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS
OK, the problem is at line 4014 of gcc/Makefile.in: $(MAKE) TESTSUITEDIR="$(TESTSUITEDIR)" RUNTESTFLAGS="$(RUNTESTFLAGS)" \ check-parallel-$* \ Even worse, one can inject arbitrary shell commands here, not that I can think of a scenario where it would be an actual security problem: RUNTESTFLAGS="i386.exp=a b\"; beep\"" check-c I presume that the solution would be to re-escape the contents of RUNTESTFLAGS. Daniel
[PATCH] [i386] PR 81850 Don't ignore -mabi=sysv on Cygwin/MinGW
> Please add UNKNOWN_ABI to the enum and initialize -mabi in i386.opt to > UNKNOWN_ABI. It would seem to me that UNSPECIFIED_ABI would be a better value name. Also, I don't really understand what opts_set and opts are, except that I had guessed opts_set is what the user asked for (or didn't ask for) and opts is what we're going to actually use. Am I close? I'm re-running tests, so if they pass is this OK? Thanks, Daniel --- gcc/config/i386/i386-opts.h | 5 +++-- gcc/config/i386/i386.c | 3 +-- gcc/config/i386/i386.opt| 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h index 542cd0f3d67..a1d1552a3c6 100644 --- a/gcc/config/i386/i386-opts.h +++ b/gcc/config/i386/i386-opts.h @@ -44,8 +44,9 @@ last_alg /* Available call abi. */ enum calling_abi { - SYSV_ABI = 0, - MS_ABI = 1 + UNSPECIFIED_ABI = 0, + SYSV_ABI = 1, + MS_ABI = 2 }; enum fpmath_unit diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 650bcbc65ae..c08ad55fcd9 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -5681,12 +5681,11 @@ ix86_option_override_internal (bool main_args_p, opts->x_ix86_pmode = TARGET_LP64_P (opts->x_ix86_isa_flags) ? PMODE_DI : PMODE_SI; - if (!opts_set->x_ix86_abi) + if (opts_set->x_ix86_abi == UNSPECIFIED_ABI) opts->x_ix86_abi = DEFAULT_ABI; if (opts->x_ix86_abi == MS_ABI && TARGET_X32_P (opts->x_ix86_isa_flags)) error ("-mabi=ms not supported with X32 ABI"); - gcc_assert (opts->x_ix86_abi == SYSV_ABI || opts->x_ix86_abi == MS_ABI); /* For targets using ms ABI enable ms-extensions, if not explicit turned off. For non-ms ABI we turn off this diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index cd564315f04..f7b9f9707f7 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -525,7 +525,7 @@ Target Report Mask(IAMCU) Generate code that conforms to Intel MCU psABI. mabi= -Target RejectNegative Joined Var(ix86_abi) Enum(calling_abi) Init(SYSV_ABI) +Target RejectNegative Joined Var(ix86_abi) Enum(calling_abi) Init(UNSPECIFIED_ABI) Generate code that conforms to the given ABI. Enum -- 2.13.3
[PATCH v4 0/4] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f
I had to fix a few things for x32 compatibility and I this is ready now. H.J. tested on machine with avx512 (including x32) and I've tested both native x32 and normal x86_64 with m64, m32 and mx32 and all is well. I've made more changes to the tests so I'm just submitting a version 2 of the whole patch set. OK for trunk? 2017-08-22 Daniel Santos * config/i386/i386.h (ix86_frame::stack_realign_allocate_offset): Remove field. (ix86_frame::stack_realign_allocate): New field. (struct machine_frame_state): Modify comments. (machine_frame_state::sp_realigned_fp_end): New field. * config/i386/i386.c (ix86_compute_frame_layout): Rework stack frame layout calculation. (sp_valid_at): Add assertion to assure no attempt to access invalid offset of a realigned stack. (fp_valid_at): Likewise. (choose_baseaddr): Modify comments. (ix86_emit_outlined_ms2sysv_save): Adjust to changes in ix86_expand_prologue. (ix86_expand_prologue): Modify stack realignment and allocation. (ix86_expand_epilogue): Modify comments. 2017-08-22 Daniel Santos * gcc.target/i386/pr80969-1.c: New testcase. * gcc.target/i386/pr80969-2a.c: Likewise. * gcc.target/i386/pr80969-2.c: Likewise. * gcc.target/i386/pr80969-3.c: Likewise. * gcc.target/i386/pr80969-4a.c: Likewise. * gcc.target/i386/pr80969-4b.c: Likewise. * gcc.target/i386/pr80969-4.c: Likewise. * gcc.target/i386/pr80969-4.h: New header common to pr80969-4*.c Thanks, Daniel
[PATCH 1/4] [i386] Correct comments, add assertions to sp_valid_at and fp_valid_at
When we realign the stack frame (without DRAP), there may be a range of CFA offsets that should never be touched because they are alignment padding and any reference to them is almost certainly an error. Previously, only the offset of where the realigned stack frame starts was recorded and checked in sp_valid_at and fp_valid_at. This change adds sp_realigned_fp_last to struct machine_frame_state to record the last valid offset from which the frame pointer can be used when the stack pointer is realigned and modifies sp_valid_at and fp_valid_at to fail an assertion when passed an offset in the "no-man's land" between these two values. Comments for struct machine_frame_state incorrectly stated that a realigned stack pointer could be used to access offsets equal to or greater than sp_realigned_offset, but it is only valid for offsets that are greater. This was the (incorrect) behaviour of sp_valid_at and fp_valid_at prior to r250587 and this change now corrects the documentation and adds clarification of the CFA-relative calculation. Signed-off-by: Daniel Santos --- gcc/config/i386/i386.c | 45 ++--- gcc/config/i386/i386.h | 18 +- 2 files changed, 43 insertions(+), 20 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index c08ad55fcd9..601e3ef47f6 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -13177,26 +13177,36 @@ choose_baseaddr_len (unsigned int regno, HOST_WIDE_INT offset) return len; } -/* Determine if the stack pointer is valid for accessing the cfa_offset. - The register is saved at CFA - CFA_OFFSET. */ +/* Determine if the stack pointer is valid for accessing the CFA_OFFSET in + the frame save area. The register is saved at CFA - CFA_OFFSET. */ -static inline bool +static bool sp_valid_at (HOST_WIDE_INT cfa_offset) { const struct machine_frame_state &fs = cfun->machine->fs; - return fs.sp_valid && !(fs.sp_realigned - && cfa_offset <= fs.sp_realigned_offset); + if (fs.sp_realigned && cfa_offset <= fs.sp_realigned_offset) +{ + /* Validate that the cfa_offset isn't in a "no-man's land". */ + gcc_assert (cfa_offset <= fs.sp_realigned_fp_last); + return false; +} + return fs.sp_valid; } -/* Determine if the frame pointer is valid for accessing the cfa_offset. - The register is saved at CFA - CFA_OFFSET. */ +/* Determine if the frame pointer is valid for accessing the CFA_OFFSET in + the frame save area. The register is saved at CFA - CFA_OFFSET. */ static inline bool fp_valid_at (HOST_WIDE_INT cfa_offset) { const struct machine_frame_state &fs = cfun->machine->fs; - return fs.fp_valid && !(fs.sp_valid && fs.sp_realigned - && cfa_offset > fs.sp_realigned_offset); + if (fs.sp_realigned && cfa_offset > fs.sp_realigned_fp_last) +{ + /* Validate that the cfa_offset isn't in a "no-man's land". */ + gcc_assert (cfa_offset >= fs.sp_realigned_offset); + return false; +} + return fs.fp_valid; } /* Choose a base register based upon alignment requested, speed and/or @@ -14675,6 +14685,9 @@ ix86_expand_prologue (void) int align_bytes = crtl->stack_alignment_needed / BITS_PER_UNIT; gcc_assert (align_bytes > MIN_STACK_BOUNDARY / BITS_PER_UNIT); + /* Record last valid frame pointer offset. */ + m->fs.sp_realigned_fp_last = m->fs.sp_offset; + /* The computation of the size of the re-aligned stack frame means that we must allocate the size of the register save area before performing the actual alignment. Otherwise we cannot guarantee @@ -14688,13 +14701,15 @@ ix86_expand_prologue (void) insn = emit_insn (ix86_gen_andsp (stack_pointer_rtx, stack_pointer_rtx, GEN_INT (-align_bytes))); - /* For the purposes of register save area addressing, the stack -pointer can no longer be used to access anything in the frame -below m->fs.sp_realigned_offset and the frame pointer cannot be -used for anything at or above. */ m->fs.sp_offset = ROUND_UP (m->fs.sp_offset, align_bytes); m->fs.sp_realigned = true; m->fs.sp_realigned_offset = m->fs.sp_offset - frame.nsseregs * 16; + /* The stack pointer may no longer be equal to CFA - m->fs.sp_offset. +Beyond this point, stack access should be done via choose_baseaddr or +by using sp_valid_at and fp_valid_at to determine the correct base +register. Henceforth, any CFA offset should be thought of as logical +and not physical. */ + gcc_assert (m->fs.sp_realigned_offset >= m->fs.sp_realigned_fp_last); gcc_assert (m->fs.
[PATCH 2/4] [i386] Modify ix86_compute_frame_layout
These changes affect how the stack frame is calculated from the region starting at frame.reg_save_offset until frame.frame_pointer_offset, which includes either the stub save area or the (inline) SSE register save area and the va_args register save area. The calculation used when not realigning the stack pointer is the same, but when when realigning we calculate the 16-byte aligned space needed in reverse so that the stack realignment boundary at frame.stack_realign_offset may not necessarily be a multiple of stack_alignment_needed, but the value of frame.frame_pointer_offset will. This results in a properly aligned stack for the function body and avoids wasting stack space. Signed-off-by: Daniel Santos --- gcc/config/i386/i386.c | 116 + gcc/config/i386/i386.h | 2 +- 2 files changed, 80 insertions(+), 38 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 601e3ef47f6..30e84dd5303 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12960,6 +12960,14 @@ ix86_compute_frame_layout (void) gcc_assert (preferred_alignment >= STACK_BOUNDARY / BITS_PER_UNIT); gcc_assert (preferred_alignment <= stack_alignment_needed); + /* The only ABI saving SSE regs should be 64-bit ms_abi. */ + gcc_assert (TARGET_64BIT || !frame->nsseregs); + if (TARGET_64BIT && m->call_ms2sysv) +{ + gcc_assert (stack_alignment_needed >= 16); + gcc_assert (!frame->nsseregs); +} + /* For SEH we have to limit the amount of code movement into the prologue. At present we do this via a BLOCKAGE, at which point there's very little scheduling that can be done, which means that there's very little point @@ -13022,54 +13030,88 @@ ix86_compute_frame_layout (void) if (TARGET_SEH) frame->hard_frame_pointer_offset = offset; - /* When re-aligning the stack frame, but not saving SSE registers, this - is the offset we want adjust the stack pointer to. */ - frame->stack_realign_allocate_offset = offset; + /* Calculate the size of the va-arg area (not including padding, if any). */ + frame->va_arg_size = ix86_varargs_gpr_size + ix86_varargs_fpr_size; - /* The re-aligned stack starts here. Values before this point are not - directly comparable with values below this point. Use sp_valid_at - to determine if the stack pointer is valid for a given offset and - fp_valid_at for the frame pointer. */ if (stack_realign_fp) -offset = ROUND_UP (offset, stack_alignment_needed); - frame->stack_realign_offset = offset; - - if (TARGET_64BIT && m->call_ms2sysv) { - gcc_assert (stack_alignment_needed >= 16); - gcc_assert (!frame->nsseregs); + /* We may need a 16-byte aligned stack for the remainder of the +register save area, but the stack frame for the local function +may require a greater alignment if using AVX/2/512. In order +to avoid wasting space, we first calculate the space needed for +the rest of the register saves, add that to the stack pointer, +and then realign the stack to the boundary of the start of the +frame for the local function. */ + HOST_WIDE_INT space_needed = 0; + HOST_WIDE_INT sse_reg_space_needed = 0; - m->call_ms2sysv_pad_in = !!(offset & UNITS_PER_WORD); - offset += xlogue_layout::get_instance ().get_stack_space_used (); -} + if (TARGET_64BIT) + { + if (m->call_ms2sysv) + { + m->call_ms2sysv_pad_in = 0; + space_needed = xlogue_layout::get_instance ().get_stack_space_used (); + } - /* Align and set SSE register save area. */ - else if (frame->nsseregs) -{ - /* The only ABI that has saved SSE registers (Win64) also has a -16-byte aligned default stack. However, many programs violate -the ABI, and Wine64 forces stack realignment to compensate. + else if (frame->nsseregs) + /* The only ABI that has saved SSE registers (Win64) also has a + 16-byte aligned default stack. However, many programs violate + the ABI, and Wine64 forces stack realignment to compensate. */ + space_needed = frame->nsseregs * 16; + + sse_reg_space_needed = space_needed = ROUND_UP (space_needed, 16); + + /* 64-bit frame->va_arg_size should always be a multiple of 16, but +rounding to be pedantic. */ + space_needed = ROUND_UP (space_needed + frame->va_arg_size, 16); + } + else + space_needed = frame->va_arg_size; + + /* Record the allocation size required prior to the realignment AND. */ + frame->stack_realign_allocate = space_needed; + + /* The re-aligned stack starts at frame->stack_realign_offset. Values +before this point are not directly comparable with values below
[PATCH 3/4] [i386] Modify SP realignment in ix86_expand_prologue, et. al.
My first version of this patch inited m->fs.sp_realigned_fp_last with the value of m->fs.sp_offset prior to performing the stack realignment. I had forgotten, however, that when we're saving GP regs using MOV that we delay SP modification as long as possible so that the value of m->fs.sp_offset at this point is correct when we've used push, but incorrect when we've used mov. This has been tested on both x86_64-pc-linux-gnu{,x32} with --target_board=unix/\{-m64,-mx32,-m32\}. Original patch description: The SP allocation calculation is now done in ix86_compute_frame_layout and the result stored in ix86_frame::stack_realign_allocate. This change also updates comments for choose_baseaddr to clarify that the alignment returned doesn't necessarily reflect the alignment of the cfa_offset passed (e.g., you can pass cfa_offset 48 and it can return an alignment of 64 bytes). Since the alignment required may be more than 16-bytes, we cannot defer SP allocation to ix86_emit_outlined_ms2sysv_save (when it's enabled), so that function needs to be updated as well. Signed-off-by: Daniel Santos --- gcc/config/i386/i386.c | 58 -- 1 file changed, 32 insertions(+), 26 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 30e84dd5303..dbc771da8aa 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -13359,10 +13359,13 @@ choose_basereg (HOST_WIDE_INT cfa_offset, rtx &base_reg, } /* Return an RTX that points to CFA_OFFSET within the stack frame and - the alignment of address. If align is non-null, it should point to + the alignment of address. If ALIGN is non-null, it should point to an alignment value (in bits) that is preferred or zero and will - recieve the alignment of the base register that was selected. The - valid base registers are taken from CFUN->MACHINE->FS. */ + recieve the alignment of the base register that was selected, + irrespective of rather or not CFA_OFFSET is a multiple of that + alignment value. + + The valid base registers are taken from CFUN->MACHINE->FS. */ static rtx choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align) @@ -14445,35 +14448,35 @@ ix86_emit_outlined_ms2sysv_save (const struct ix86_frame &frame) rtx sym, addr; rtx rax = gen_rtx_REG (word_mode, AX_REG); const struct xlogue_layout &xlogue = xlogue_layout::get_instance (); - HOST_WIDE_INT rax_offset = xlogue.get_stub_ptr_offset () + m->fs.sp_offset; - HOST_WIDE_INT stack_alloc_size = frame.stack_pointer_offset - m->fs.sp_offset; - HOST_WIDE_INT stack_align_off_in = xlogue.get_stack_align_off_in (); + HOST_WIDE_INT allocate = frame.stack_pointer_offset - m->fs.sp_offset; + + /* AL should only be live with sysv_abi. */ + gcc_assert (!ix86_eax_live_at_start_p ()); + + /* Setup RAX as the stub's base pointer. We use stack_realign_offset rather + we've actually realigned the stack or not. */ + align = GET_MODE_ALIGNMENT (V4SFmode); + addr = choose_baseaddr (frame.stack_realign_offset + + xlogue.get_stub_ptr_offset (), &align); + gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode)); + emit_insn (gen_rtx_SET (rax, addr)); - /* Verify that the incoming stack 16-byte alignment offset matches the - layout we're using. */ - gcc_assert (stack_align_off_in == (m->fs.sp_offset & UNITS_PER_WORD)); + /* Allocate stack if not already done. */ + if (allocate > 0) + pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, + GEN_INT (-allocate), -1, false); /* Get the stub symbol. */ sym = xlogue.get_stub_rtx (frame_pointer_needed ? XLOGUE_STUB_SAVE_HFP : XLOGUE_STUB_SAVE); RTVEC_ELT (v, vi++) = gen_rtx_USE (VOIDmode, sym); - /* Setup RAX as the stub's base pointer. */ - align = GET_MODE_ALIGNMENT (V4SFmode); - addr = choose_baseaddr (rax_offset, &align); - gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode)); - insn = emit_insn (gen_rtx_SET (rax, addr)); - - gcc_assert (stack_alloc_size >= xlogue.get_stack_space_used ()); - pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, -GEN_INT (-stack_alloc_size), -1, -m->fs.cfa_reg == stack_pointer_rtx); for (i = 0; i < ncregs; ++i) { const xlogue_layout::reginfo &r = xlogue.get_reginfo (i); rtx reg = gen_rtx_REG ((SSE_REGNO_P (r.regno) ? V4SFmode : word_mode), r.regno); - RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);; + RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset); } gcc_assert (vi == (unsigned)GET_NUM_ELEM (v)); @@ -14728,14 +14731,15 @@ ix86_expand_prologue (void) gcc_assert (align_bytes > MIN
[PATCH 4/4] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available
Changes to lib/target-supports.exp and documentation: * Add effective-targets avx512f and avx512f_runtime (needed for new tests). * Corrects bug in check_avx2_hw_available. * Adds documentation for effective-targets avx2, avx2_runtime (both missing), avx512f and avx512f_runtime. The following tests are added. The testcase in the PR is used as a base and relevant variants are added to test other factors affected by the patch set. pr80969-1.c Base test case. pr80969-2.c With ms to sysv call. pr80969-2a.c With ms to sysv call using stubs. pr80969-3.c With alloca (for DRAP test). pr80969-4.c With va_args passed via va_list pr80969-4a.c With va_args passed via va_list and ms to sysv call. pr80969-4b.c With va_args passed via va_list and ms to sysv call using stubs. pr80969-4.h Common header for pr80969-4*.c. Signed-off-by: Daniel Santos --- gcc/doc/sourcebuild.texi | 12 +++ gcc/testsuite/gcc.target/i386/pr80969-1.c | 16 gcc/testsuite/gcc.target/i386/pr80969-2.c | 27 +++ gcc/testsuite/gcc.target/i386/pr80969-2a.c | 8 ++ gcc/testsuite/gcc.target/i386/pr80969-3.c | 32 gcc/testsuite/gcc.target/i386/pr80969-4.c | 9 +++ gcc/testsuite/gcc.target/i386/pr80969-4.h | 119 + gcc/testsuite/gcc.target/i386/pr80969-4a.c | 9 +++ gcc/testsuite/gcc.target/i386/pr80969-4b.c | 9 +++ gcc/testsuite/lib/target-supports.exp | 66 10 files changed, 307 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4.h create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4b.c diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index e6313dc031e..0bf4d6afeb6 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -1855,6 +1855,18 @@ Target supports compiling @code{avx} instructions. @item avx_runtime Target supports the execution of @code{avx} instructions. +@item avx2 +Target supports compiling @code{avx2} instructions. + +@item avx2_runtime +Target supports the execution of @code{avx2} instructions. + +@item avx512f +Target supports compiling @code{avx512f} instructions. + +@item avx512f_runtime +Target supports the execution of @code{avx512f} instructions. + @item cell_hw Test system can execute AltiVec and Cell PPU instructions. diff --git a/gcc/testsuite/gcc.target/i386/pr80969-1.c b/gcc/testsuite/gcc.target/i386/pr80969-1.c new file mode 100644 index 000..e0520b45c40 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80969-1.c @@ -0,0 +1,16 @@ +/* { dg-do run { target { ! x32 } } } */ +/* { dg-options "-Ofast -mabi=ms -mavx512f" } */ +/* { dg-require-effective-target avx512f } */ + +int a[56]; +int b; +int main (int argc, char *argv[]) { + int c; + for (; b; b++) { +c = b; +if (b & 1) + c = 2; +a[b] = c; + } + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2.c b/gcc/testsuite/gcc.target/i386/pr80969-2.c new file mode 100644 index 000..f885dee6512 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80969-2.c @@ -0,0 +1,27 @@ +/* { dg-do run { target { { ! x32 } && avx512f_runtime } } } */ +/* { dg-do compile { target { { ! x32 } && { ! avx512f_runtime } } } } */ +/* { dg-options "-Ofast -mabi=ms -mavx512f" } */ +/* { dg-require-effective-target avx512f } */ + +/* Test when calling a sysv func. */ + +int a[56]; +int b; + +static void __attribute__((sysv_abi)) sysv () +{ +} + +void __attribute__((sysv_abi)) (*volatile const sysv_noinfo)() = sysv; + +int main (int argc, char *argv[]) { + int c; + sysv_noinfo (); + for (; b; b++) { +c = b; +if (b & 1) + c = 2; +a[b] = c; + } + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2a.c b/gcc/testsuite/gcc.target/i386/pr80969-2a.c new file mode 100644 index 000..baea0796d24 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80969-2a.c @@ -0,0 +1,8 @@ +/* { dg-do run { target { lp64 && avx512f_runtime } } } */ +/* { dg-do compile { target { lp64 && { ! avx512f_runtime } } } } */ +/* { dg-options "-Ofast -mabi=ms -mavx512f -mcall-ms2sysv-xlogues" } */ +/* { dg-require-effective-target avx512f } */ + +/* Test when calling a sysv func using save/restore stubs. */ + +#include "pr80969-2.c" diff --git a/gcc/testsuite/gcc.target/i386/pr80969-3.c b/gcc/testsuite/gcc.target/i386/pr80969-3.c new file mode 100644 index 000..d902a771cc8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80969-3.c @@ -0,0 +1,32 @@ +/* { dg-do run { targe
Re: [PATCH] [i386] PR 81850 Don't ignore -mabi=sysv on Cygwin/MinGW
On 08/22/2017 03:00 PM, Uros Bizjak wrote: > On Tue, Aug 22, 2017 at 9:47 PM, Daniel Santos > wrote: >>> Please add UNKNOWN_ABI to the enum and initialize -mabi in i386.opt to >>> UNKNOWN_ABI. >> It would seem to me that UNSPECIFIED_ABI would be a better value name. >> >> Also, I don't really understand what opts_set and opts are, except that I had >> guessed opts_set is what the user asked for (or didn't ask for) and opts is >> what we're going to actually use. Am I close? > Yes. opts_set is a flag that user specified an option at the command line. > > However, I fail to see what is the problem. If nothing was specified, > then opts->x_ix86_abi is set to DEFAULT_ABI. That is not what is happening. If -mabi=sysv is specified, then the test (!opts_set->x_ix86_abi) is true since the value of SYSV_ABI is zero. When that is evaluated as true, then the abi is set to DEFAULT_ABI, which on Windows is MS_ABI, thus ignoring the command line option. > Probably we don't need > Init(SYSV_ABI) in mabi= declaration at all. I'm guessing that if we don't specify an Init() option then it will default to zero? We just need a valid way to differentiate when -mabi=sysv has been passed from when nothing has been passed. Daniel > > Uros. > >> I'm re-running tests, so if they pass is this OK? >> >> Thanks, >> Daniel >> --- >> gcc/config/i386/i386-opts.h | 5 +++-- >> gcc/config/i386/i386.c | 3 +-- >> gcc/config/i386/i386.opt| 2 +- >> 3 files changed, 5 insertions(+), 5 deletions(-) >> >> diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h >> index 542cd0f3d67..a1d1552a3c6 100644 >> --- a/gcc/config/i386/i386-opts.h >> +++ b/gcc/config/i386/i386-opts.h >> @@ -44,8 +44,9 @@ last_alg >> /* Available call abi. */ >> enum calling_abi >> { >> - SYSV_ABI = 0, >> - MS_ABI = 1 >> + UNSPECIFIED_ABI = 0, >> + SYSV_ABI = 1, >> + MS_ABI = 2 >> }; >> >> enum fpmath_unit >> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c >> index 650bcbc65ae..c08ad55fcd9 100644 >> --- a/gcc/config/i386/i386.c >> +++ b/gcc/config/i386/i386.c >> @@ -5681,12 +5681,11 @@ ix86_option_override_internal (bool main_args_p, >> opts->x_ix86_pmode = TARGET_LP64_P (opts->x_ix86_isa_flags) >> ? PMODE_DI : PMODE_SI; >> >> - if (!opts_set->x_ix86_abi) >> + if (opts_set->x_ix86_abi == UNSPECIFIED_ABI) >> opts->x_ix86_abi = DEFAULT_ABI; >> >>if (opts->x_ix86_abi == MS_ABI && TARGET_X32_P (opts->x_ix86_isa_flags)) >> error ("-mabi=ms not supported with X32 ABI"); >> - gcc_assert (opts->x_ix86_abi == SYSV_ABI || opts->x_ix86_abi == MS_ABI); >> >>/* For targets using ms ABI enable ms-extensions, if not >> explicit turned off. For non-ms ABI we turn off this >> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt >> index cd564315f04..f7b9f9707f7 100644 >> --- a/gcc/config/i386/i386.opt >> +++ b/gcc/config/i386/i386.opt >> @@ -525,7 +525,7 @@ Target Report Mask(IAMCU) >> Generate code that conforms to Intel MCU psABI. >> >> mabi= >> -Target RejectNegative Joined Var(ix86_abi) Enum(calling_abi) Init(SYSV_ABI) >> +Target RejectNegative Joined Var(ix86_abi) Enum(calling_abi) >> Init(UNSPECIFIED_ABI) >> Generate code that conforms to the given ABI. >> >> Enum >> -- >> 2.13.3 >>
Re: [PATCH 4/4] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available
On 08/23/2017 08:26 AM, Uros Bizjak wrote: >> @@ -1822,6 +1845,7 @@ proc check_avx2_hw_available { } { >> expr 0 >> } else { >> check_runtime_nocache avx2_hw_available { >> + #include > Why is the above include needed? It is only needed to #define NULL. Without the include, I've had this function fail due to NULL being undefined. Daniel
Re: [PATCH] [i386] PR 81850 Don't ignore -mabi=sysv on Cygwin/MinGW
On 08/23/2017 01:12 AM, Uros Bizjak wrote: > On Wed, Aug 23, 2017 at 7:23 AM, Daniel Santos > wrote: >> On 08/22/2017 03:00 PM, Uros Bizjak wrote: >>> On Tue, Aug 22, 2017 at 9:47 PM, Daniel Santos >>> wrote: >>>>> Please add UNKNOWN_ABI to the enum and initialize -mabi in i386.opt to >>>>> UNKNOWN_ABI. >>>> It would seem to me that UNSPECIFIED_ABI would be a better value name. >>>> >>>> Also, I don't really understand what opts_set and opts are, except that I >>>> had >>>> guessed opts_set is what the user asked for (or didn't ask for) and opts is >>>> what we're going to actually use. Am I close? >>> Yes. opts_set is a flag that user specified an option at the command line. >>> >>> However, I fail to see what is the problem. If nothing was specified, >>> then opts->x_ix86_abi is set to DEFAULT_ABI. >> That is not what is happening. If -mabi=sysv is specified, then the >> test (!opts_set->x_ix86_abi) is true since the value of SYSV_ABI is >> zero. When that is evaluated as true, then the abi is set to >> DEFAULT_ABI, which on Windows is MS_ABI, thus ignoring the command line >> option. > Let's use the following patch: > > --cut here-- > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index 3c82ae64f4f2..f8590f663285 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -5682,7 +5682,7 @@ ix86_option_override_internal (bool main_args_p, > ? PMODE_DI : PMODE_SI; > >if (!opts_set->x_ix86_abi) > -opts->x_ix86_abi = DEFAULT_ABI; > +printf ("Using default ABI\n"), opts->x_ix86_abi = DEFAULT_ABI; > >/* For targets using ms ABI enable ms-extensions, if not > explicit turned off. For non-ms ABI we turn off this > --cut here-- > > $ ./cc1 -O2 -quiet hello.c > Using default ABI > $ ./cc1 -O2 -mabi=sysv -quiet hello.c > $ > $ ./cc1 -O2 -mabi=sysv -quiet hello.c > $ > > Again, opts_set is set to true when the option is specified on the > command line, it has nothing to do with the value of the option. Interesting, I get the same result and in fact I can't reproduce the bug anymore. Either I made a mistake somewhere (likely) or something else fixed the problem (less likely). I'll try again from where the trunk was when I filed the bug and close it either invalid or fixed depending upon which it is. Thanks! Daniel >> I'm guessing that if we don't specify an Init() option then it will >> default to zero? We just need a valid way to differentiate when >> -mabi=sysv has been passed from when nothing has been passed. > Yes, it defaults to zero, but since we live in c++ world nowadays, we > can't initialize enum with integer zero... > > Uros.
Re: [PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues
On 05/05/2017 03:56 AM, Daniel Santos wrote: On 05/02/2017 05:40 AM, Kai Tietz wrote: Right, and Wine people will tell, if something doesn't work for them. So ok for me too. Kai Well, I haven't re-run these tests in a few months, but I got 272 failed wine tests with gcc 7.1 and 234 with my patch set rebased onto 7.1. So it looks like I'll be trying to diagnose these failures this weekend. Those are bad numbers. I had forgotten to filter out the testlist.o files. Below are my most recent numbers running Wine 2.7: gcc-5.4.0 CFLAGS="-march=native -O2 -g": 74 gcc-7.1.0 CFLAGS="-march=native -O2 -g": 74 gcc-7.1.0 CFLAGS="-march=nocona -mtune=generic -O2 -g": 79 gcc-7.1.0 CFLAGS="-march=native -O2 -g -mcall-ms2sysv-xlogues" (patched): 31 I'm building out a clean test environment on another machine to try to rule out clutter issues (and video driver issues) on my workstation. Daniel
Re: [PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues
On 05/06/2017 03:22 PM, Daniel Santos wrote: gcc-5.4.0 CFLAGS="-march=native -O2 -g": 74 gcc-7.1.0 CFLAGS="-march=native -O2 -g": 74 gcc-7.1.0 CFLAGS="-march=nocona -mtune=generic -O2 -g": 79 gcc-7.1.0 CFLAGS="-march=native -O2 -g -mcall-ms2sysv-xlogues" (patched): 31 I'm building out a clean test environment on another machine to try to rule out clutter issues (and video driver issues) on my workstation. Daniel I've re-run Wine's tests with a new clean VM environment and some changes to include more tests and similar results: Compiler Failures gcc-4.9.4: 39 gcc-7.1.0: 78 gcc-7.1.0-patched (with -mcall-ms2sysv-xlogues): 40 The first error not present in the gcc-4.9.4 tests that I examined looked like a run-of-the-mill race condition in Wine that just happened to not crash when built with 4.9.4. So I'm going to guess that the disappearance of these failures with -mcall-ms2sysv-xlogues is just incidental. I think we're in good condition with this patch set. Daniel
[PING] [PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues
Ping? I have posted revisions of the following in patch set: 05/12 - https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01442.html 09/12 - https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00348.html 11/12 - https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00350.html I have retested them on Linux x86-64 in addition a Wine testsuite comparison resulting in fewer failed tests (31) than when using unpatched 7.1.0 (78) and 5.4.0 (78). A cursory examination of the now working failures with 7.1.0 seemed to be to be due to race conditions in Wine that are incidentally hidden after the patches. Is there anything else needed before we can commit these? They still rebase cleanly onto the HEAD, but I can repost as "v5" if you prefer. Thanks, Daniel
Re: [PING] [PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues
On 05/13/2017 11:52 AM, Uros Bizjak wrote: On Sat, May 13, 2017 at 1:01 AM, Daniel Santos wrote: Ping? I have posted revisions of the following in patch set: 05/12 - https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01442.html 09/12 - https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00348.html 11/12 - https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00350.html I have retested them on Linux x86-64 in addition a Wine testsuite comparison resulting in fewer failed tests (31) than when using unpatched 7.1.0 (78) and 5.4.0 (78). A cursory examination of the now working failures with 7.1.0 seemed to be to be due to race conditions in Wine that are incidentally hidden after the patches. Is there anything else needed before we can commit these? They still rebase cleanly onto the HEAD, but I can repost as "v5" if you prefer. Please go ahead and commit the patches. However, please stay around to fix possible fallout. As said - you are touching quite complex part of the compiler ... Thanks, Uros. Thanks! I'll definitely be around, I have a lot more that I'm working on with C generics/pseudo-templates (all middle-end stuff). I also want to examine more ways that SSE saves/restores can be omitted in these ms to sysv calls through static analysis and such. Anyway, I don't yet have SVN write access, will you sponsor my request? Thanks, Daniel
Re: [PATCH] [i386] Recompute the frame layout less often
On 05/14/2017 02:42 AM, Bernd Edlinger wrote: Hi, this patch uses the new TARGET_COMPUTE_FRAME_LAYOUT hook in the i386 backend to avoid re-computing the frame layout when not really necessary. It simplifies the logic in ix86_compute_frame_layout by removing the use_fast_prologue_epilogue_nregs, which is no longer necessary, because the frame layout can no longer change spontaneously. Bootstrapped and reg-tested on x86_64-pc-linux-gnu. Is it OK for trunk? Thanks Bernd. I think Uros is about to commit my improvements to ms to sysv abi calls, which is a large change and will conflict with your patch. I've added several new fields to struct ix86_frame that will need to be merged (and moved to i386.h). I believe that my only explicit check of crtl->stack_realign_finalized is during pro/epilogue expand, and not in ix86_compute_frame_layout. A former incarnation of my patches needed ix86_compute_frame_layout to be called *after* it was set, but I believe that is no longer the case, and so shouldn't conflict, but retesting should certainly be done. https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01338.html Thanks, Daniel
Re: [PATCH] [i386] Recompute the frame layout less often
On 05/14/2017 11:31 AM, Bernd Edlinger wrote: Hi Daniel, there is one thing I don't understand in your patch: That is, it introduces a static value: /* Registers who's save & restore will be managed by stubs called from pro/epilogue. */ static HARD_REG_SET GTY(()) stub_managed_regs; This seems to be set as a side effect of ix86_compute_frame_layout, and depends on the register usage of the current function. But values that depend on the current function need usually be attached to cfun->machine, because the passes can run in parallel unless I am completely mistaken, and the stub_managed_regs may therefore be computed from a different function. Bernd. I'm relatively new to GCC and still learning. However, there are quite a lot of static TU variables in i386.c like this. I am not aware of gcc having parallelism support, but if it were to be added then all of these TU variables should probably be moved to some class or struct (like cfun->machine) to reduce the number of TLS lookups required (which I presume is a little more expensive than a this/offset calculation). Having this (as well as other variables) in such a struct is better design IMO, but as I said, I'm still learning GCC's architecture, idioms and patterns. (I should add that I don't really understand the GTY memory management either. :) To be clear on class xlogue_layout, the only instances of this class are const and could be shared across multiple threads. It is dependent upon the cfun->machine as well as the global struct rtl_data crtl, but is not so entangled that were these proper C++ classes (with private data) that it would need to be a friend -- it only needs read-access to their data members. To be honest, it's a strange feeling programming in a mixture of C and C++ idioms, but I know it was only recently converted to C++ so I think it's better to try to use only one or the other in a given function. But if I were going to do this all OO, then ix86_compute_frame_layout would be a member function of ix86_frame (which would be a specialization of some generic "frame" class), machine_function would be class ix86_machine_function with it's own compute_frame_layout that called ix86_frame::compute_frame_layout, etc. If I really wanted to go nuts, I would consider making class function, et.al. template classes with machine_function and machine_function_state part of the object instead of pointers to separate objects to reduce accesses down to a single this/offset, but now I I'm *really* digressing... Please free to move it. Thanks, Daniel
Re: [PATCH] [i386] Recompute the frame layout less often
On 05/14/2017 11:31 AM, Bernd Edlinger wrote: Hi Daniel, there is one thing I don't understand in your patch: That is, it introduces a static value: /* Registers who's save & restore will be managed by stubs called from pro/epilogue. */ static HARD_REG_SET GTY(()) stub_managed_regs; This seems to be set as a side effect of ix86_compute_frame_layout, and depends on the register usage of the current function. But values that depend on the current function need usually be attached to cfun->machine, because the passes can run in parallel unless I am completely mistaken, and the stub_managed_regs may therefore be computed from a different function. Bernd. I should add that if you want to run faster tests just on the ms to sysv abi code, you can use make RUNTESTFLAGS="ms-sysv.exp" check and then if that succeeds run the full testsuite. Daniel
Re: [PATCH] [i386] Recompute the frame layout less often
On 05/15/2017 03:39 PM, Bernd Edlinger wrote: On 05/15/17 03:39, Daniel Santos wrote: On 05/14/2017 11:31 AM, Bernd Edlinger wrote: Hi Daniel, there is one thing I don't understand in your patch: That is, it introduces a static value: /* Registers who's save & restore will be managed by stubs called from pro/epilogue. */ static HARD_REG_SET GTY(()) stub_managed_regs; This seems to be set as a side effect of ix86_compute_frame_layout, and depends on the register usage of the current function. But values that depend on the current function need usually be attached to cfun->machine, because the passes can run in parallel unless I am completely mistaken, and the stub_managed_regs may therefore be computed from a different function. Bernd. I should add that if you want to run faster tests just on the ms to sysv abi code, you can use make RUNTESTFLAGS="ms-sysv.exp" check and then if that succeeds run the full testsuite. Daniel Unfortunately I encounter a serious problem when my patch is used ontop of your patch, Yes, the test suite ran without error, but then I tried to trigger the warning and that tripped an ICE. The reason is that cfun->machine->call_ms2sysv can be set to true *after* reload_completed, which can be seen using the following patch: Index: i386.c === --- i386.c (revision 248031) +++ i386.c (working copy) @@ -29320,7 +29320,10 @@ /* Set here, but it may get cleared later. */ if (TARGET_CALL_MS2SYSV_XLOGUES) + { + gcc_assert(!reload_completed); cfun->machine->call_ms2sysv = true; + } } if (vec_len > 1) That assertion is triggered in this test case: cat test.c int test() { __builtin_printf("test\n"); return 0; } gcc -mabi=ms -mcall-ms2sysv-xlogues -fsplit-stack -c test.c test.c: In function 'test': test.c:5:1: internal compiler error: in ix86_expand_call, at config/i386/i386.c:29324 } ^ 0x13390a4 ix86_expand_call(rtx_def*, rtx_def*, rtx_def*, rtx_def*, rtx_def*, bool) ../../gcc-trunk/gcc/config/i386/i386.c:29324 0x1317494 ix86_expand_split_stack_prologue() ../../gcc-trunk/gcc/config/i386/i386.c:15920 0x162ba21 gen_split_stack_prologue() ../../gcc-trunk/gcc/config/i386/i386.md:12556 0x12f3f30 target_gen_split_stack_prologue ../../gcc-trunk/gcc/config/i386/i386.md:12325 0xb237b3 make_split_prologue_seq ../../gcc-trunk/gcc/function.c:5822 0xb23a08 thread_prologue_and_epilogue_insns() ../../gcc-trunk/gcc/function.c:5958 0xb24840 rest_of_handle_thread_prologue_and_epilogue ../../gcc-trunk/gcc/function.c:6428 0xb248c0 execute ../../gcc-trunk/gcc/function.c:6470 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. so, in ix86_expand_split_stack_prologue we first call: ix86_finalize_stack_realign_flags (); ix86_compute_frame_layout (&frame); and later: call_insn = ix86_expand_call (NULL_RTX, gen_rtx_MEM (QImode, fn), GEN_INT (UNITS_PER_WORD), constm1_rtx, pop, false); which changes a flag with a huge impact on the frame layout, but there is no absolutely no way how the frame layout can change once it is finalized. Any Thoughts? Bernd. Well, my intention was actually to punt on those cases, but I hadn't actually tested with -fsplit-stack. It looks like ix86_expand_split_stack_prologue calls ix86_expand_call, and I hadn't anticipated it getting called after the last call to ix86_compute_frame_layout(), which your patch has probably eliminated. In the case of -fsplit-stack, I'm testing the macro flag_split_stack which (currently) just expands to check the global flag, so this could instead be done in ix86_option_override_internal () instead, but I think it highlights a somewhat deeper problem. Rather or not m->call_ms2sysv is set determines which stack layout is used when ix86_compute_frame_layout() runs. But if we can run expand_call after the final time ix86_compute_frame_layout() then we have a problem. It looks like ix86_expand_split_stack_prologue is the only function that manually calls ix86_expand_call, but maybe it would be better to modify the test to something like this: diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index a78819d6b3f..c36383f6962 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -29325,7 +29325,7 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1, } /* Set here, but it may get cleared later. */ - if (TARGET_CALL_MS2SYSV_XLOGUES) + if (TARGET_CALL_MS2SYSV_XLOGUES && !reload_completed) cfun->machine->call_ms2sysv = true; } Or eve
Re: [PATCH] [i386] Recompute the frame layout less often
Ian, would you mind looking at this please? A combination of my -mcall-ms2sysv-xlogues patch with Bernd's patch is causing problems when ix86_expand_split_stack_prologue() calls ix86_expand_call(). On 05/15/2017 06:46 PM, Daniel Santos wrote: Rather or not m->call_ms2sysv is set determines which stack layout is used when ix86_compute_frame_layout() runs. But if we can run expand_call after the final time ix86_compute_frame_layout() then we have a problem. It looks like ix86_expand_split_stack_prologue is the only function that manually calls ix86_expand_call, but maybe it would be better to modify the test to something like this: diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index a78819d6b3f..c36383f6962 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -29325,7 +29325,7 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1, } /* Set here, but it may get cleared later. */ - if (TARGET_CALL_MS2SYSV_XLOGUES) + if (TARGET_CALL_MS2SYSV_XLOGUES && !reload_completed) cfun->machine->call_ms2sysv = true; } Actually, I think this is wrong. I happened to recall looking at the morestack code last year and remembered that it was all assembly. I looked at it again and I don't see that it calls anything outside of it's implementation file (libgcc/config/i386/morestack.S) except for _Unwind_Resume and the calling function its self (I think it calls its caller). It saves and restores rsi and rdi and doesn't use any sse registers, so it doesn't need to clobber all of the regs in the x86_64_ms_sysv_extra_clobbered_registers array. I'm guessing that this should have it's own pattern instead of calling ix86_expand_call in the first place. Of course, I'm the new guy here, so please enlighten me if I'm wrong. Thanks, Daniel
Re: [PATCH] [i386] Recompute the frame layout less often
On 05/16/2017 03:34 AM, Bernd Edlinger wrote: It would be good to have test cases for each of the not-supported warnings that can happen, so far I only managed to get a test case for -fsplit-stack. Yes, I'm inclined to agree. I'll try to get this done today or tomorrow. I've also put in a limiter of one warning per TU. One problem is that there isn't a way to disable the warning, so I may want to add that. Daniel
Re: [PATCH] [i386] Recompute the frame layout less often
On 05/16/2017 12:19 PM, Ian Lance Taylor wrote: On Mon, May 15, 2017 at 10:00 PM, Daniel Santos wrote: Ian, would you mind looking at this please? A combination of my -mcall-ms2sysv-xlogues patch with Bernd's patch is causing problems when ix86_expand_split_stack_prologue() calls ix86_expand_call(). I don't have a lot of context here. I assume that ms2sysv is going to be used on Windows systems, where -fsplit-stack isn't really going to work anyhow, so I think it would probably be OK that reject that combination if it causes trouble. Sorry I wasn't more specific. This -mcall-ms2sysv-xlogues actually targets Wine, although they don't use -fsplit-stack. My patch set as-is is disabled when fsplit-stack is used, but during ix86_compute_frame_layout, which is too late in the case of -fsplit-stack. I think I should just change this to a sorry() in ix86_option_override_internal. Also, it's overkill for ix86_expand_split_stack_prologue to call ix86_expand_call. The call is always to __morestack, and __morestack is written in assembler, so we could use a simpler version of ix86_expand_call if that helps. In particular we can decide that __morestack doesn't clobber any unusual registers, if that is what is causing the problem. Ian Well aside from the conflict of the two patches, it just looks like it has the potential to generate clobbers where none are needed, but I'm having trouble actually *proving* that, so maybe I'm just wrong. Daniel
Re: [PATCH] [i386] Recompute the frame layout less often
On 05/16/2017 02:52 PM, Bernd Edlinger wrote: I think I solved the problem with -fsplit-stack, I am not sure if ix86_static_chain_on_stack might change after reload due to final.c possibly calling targetm.calls.static_chain, but if that is the case, that is an already pre-existing problem. The goal of this patch is to make all decisions regarding the frame layout before the reload pass, and to make sure that the frame layout does not change unexpectedly it asserts that the data that goes into the decision does not change after reload_completed. With the attached patch -fsplit-stack and the attribute ms_hook_prologue is handed directly at the ix86_expand_call, because that data is already known before expansion. The calls_eh_return and ix86_static_chain_on_stack may become known at a later time, but after reload it should not change any more. To be sure, I added an assertion at ix86_static_chain, which the regression test did not trigger, neither with -m64 nor with -m32. I have bootstrapped the patch several times, and a few times I encounterd a segfault in the garbage collection, but it did not happen every time. Currently I think that is unrelated to this patch. Bootstrapped and reg-tested on x86_64-pc-linux-gnu with -m64/-m32. Is it OK for trunk? Thanks Bernd. With as many formatting errors as I seem to have had, I would like to fix those then you patch on top of that if you wouldn't mind terribly. While gcc uses subversion, git-blame is still very helpful (then again, since Uros committed it for me, I guess that's already off). Index: gcc/config/i386/i386.c === --- gcc/config/i386/i386.c(revision 248031) +++ gcc/config/i386/i386.c(working copy) @@ -2425,7 +2425,9 @@ static int const x86_64_int_return_registers[4] = /* Additional registers that are clobbered by SYSV calls. */ -unsigned const x86_64_ms_sysv_extra_clobbered_registers[12] = +#define NUM_X86_64_MS_CLOBBERED_REGS 12 +static int const x86_64_ms_sysv_extra_clobbered_registers + [NUM_X86_64_MS_CLOBBERED_REGS] = Is there a reason you're changing this unsigned to signed int? While AX_REG and such are just preprocessor macros, everywhere else it seems that register numbers are dealt with as unsigned ints. @@ -2484,13 +2486,13 @@ class xlogue_layout { needs to store registers based upon data in the machine_function. */ HOST_WIDE_INT get_stack_space_used () const { -const struct machine_function &m = *cfun->machine; -unsigned last_reg = m.call_ms2sysv_extra_regs + MIN_REGS - 1; +const struct machine_function *m = cfun->machine; +unsigned last_reg = m->call_ms2sysv_extra_regs + MIN_REGS - 1; What is the reason for this change? -gcc_assert (m.call_ms2sysv_extra_regs <= MAX_EXTRA_REGS); +gcc_assert (m->call_ms2sysv_extra_regs <= MAX_EXTRA_REGS); return m_regs[last_reg].offset -+ (m.call_ms2sysv_pad_out ? 8 : 0) -+ STUB_INDEX_OFFSET; + + (m->call_ms2sysv_pad_out ? 8 : 0) + + STUB_INDEX_OFFSET; } /* Returns the offset for the base pointer used by the stub. */ @@ -2532,7 +2534,7 @@ class xlogue_layout { /* Lazy-inited cache of symbol names for stubs. */ char m_stub_names[XLOGUE_STUB_COUNT][VARIANT_COUNT][STUB_NAME_MAX_LEN]; - static const struct xlogue_layout GTY(()) s_instances[XLOGUE_SET_COUNT]; + static const struct GTY(()) xlogue_layout s_instances[XLOGUE_SET_COUNT]; Hmm, during development I originally had C-style xlogue_layout as a struct and later decided to make it a class and apparently forgot to remove the "struct" here. None the less, it's bazaar that the GTY() would go in between the "struct" and the "xlogue_layout." As I said before, I don't fully understand how this GTY works. Can we just remove the "struct" keyword? Also, if the way I had it was wrong, (and resulted in garbage collection not working right) then perhaps it was the cause of a problem I had with caching symbol rtx objects. I could not get this to work because my cached objects would somehow become stale and I've since removed that code (from xlogue_layout::get_stub_rtx). (i.e., does GTY effect lifespan of globals, TU statics and static C++ data members?) /* Constructor for xlogue_layout. */ @@ -2639,11 +2643,11 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT stack_ : m_hfp (hfp) , m_nregs (hfp ? 17 : 18), m_stack_align_off_in (stack_align_off_in) { + HOST_WIDE_INT offset = stack_align_off_in; + unsigned i, j; + memset (m_regs, 0, sizeof (m_regs)); memset (m_stub_names, 0, sizeof (m_stub_names)); - - HOST_WIDE_INT offset = stack_align_off_in; - unsigned i, j; for (i = j = 0; i < MAX_REGS; ++i) { unsigned regno = REG_ORDER[i]; @@ -2662,11 +2666,12 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT stack_ m_regs[j].regno= regno; m_regs[j++].offset = offset - STUB_INDEX_OFFSET; } -gcc_assert (
Re: [PATCH] [i386] Recompute the frame layout less often
On 05/17/2017 12:41 PM, Bernd Edlinger wrote: Apologies if I ruined your patch... As I said before, I'm the new guy here. :) So when this is done I'll rebase my changes. I have some test stuff to fix and some refactoring and refinements to xlogue_layout::compute_stub_managed_regs(). And then I'll find a solution to the stub_managed_regs after that. Index: gcc/config/i386/i386.c === --- gcc/config/i386/i386.c(revision 248031) +++ gcc/config/i386/i386.c(working copy) @@ -2425,7 +2425,9 @@ static int const x86_64_int_return_registers[4] = /* Additional registers that are clobbered by SYSV calls. */ -unsigned const x86_64_ms_sysv_extra_clobbered_registers[12] = +#define NUM_X86_64_MS_CLOBBERED_REGS 12 +static int const x86_64_ms_sysv_extra_clobbered_registers + [NUM_X86_64_MS_CLOBBERED_REGS] = Is there a reason you're changing this unsigned to signed int? While AX_REG and such are just preprocessor macros, everywhere else it seems that register numbers are dealt with as unsigned ints. I actually there seems to be confusion about "int" vs. "unsigned int" for regno, the advantage of int, is that it can contain -1 as a exceptional value. Furthermore there are 3 similar arrays just above that also use int: static int const x86_64_int_parameter_registers[6] = { DI_REG, SI_REG, DX_REG, CX_REG, R8_REG, R9_REG }; static int const x86_64_ms_abi_int_parameter_registers[4] = { CX_REG, DX_REG, R8_REG, R9_REG }; static int const x86_64_int_return_registers[4] = { AX_REG, DX_REG, DI_REG, SI_REG }; /* Additional registers that are clobbered by SYSV calls. */ #define NUM_X86_64_MS_CLOBBERED_REGS 12 static int const x86_64_ms_sysv_extra_clobbered_registers [NUM_X86_64_MS_CLOBBERED_REGS] = { SI_REG, DI_REG, XMM6_REG, XMM7_REG, XMM8_REG, XMM9_REG, XMM10_REG, XMM11_REG, XMM12_REG, XMM13_REG, XMM14_REG, XMM15_REG }; So IMHO it looked odd to have one array use a different type in the first place. OK. I think that when I originally started this I was using elements of this array in comparisons and got the signed/unsigned warning and changed them. None of the code gives that warning now however. @@ -2484,13 +2486,13 @@ class xlogue_layout { needs to store registers based upon data in the machine_function. */ HOST_WIDE_INT get_stack_space_used () const { -const struct machine_function &m = *cfun->machine; -unsigned last_reg = m.call_ms2sysv_extra_regs + MIN_REGS - 1; +const struct machine_function *m = cfun->machine; +unsigned last_reg = m->call_ms2sysv_extra_regs + MIN_REGS - 1; What is the reason for this change? Because a mixture of C and C++ (C wants "struct" machine_function) looks ugly, and everywhere else in this module, "m" is a pointer and no reference. I see, consistency with the rest of the file. -gcc_assert (m.call_ms2sysv_extra_regs <= MAX_EXTRA_REGS); +gcc_assert (m->call_ms2sysv_extra_regs <= MAX_EXTRA_REGS); return m_regs[last_reg].offset -+ (m.call_ms2sysv_pad_out ? 8 : 0) -+ STUB_INDEX_OFFSET; + + (m->call_ms2sysv_pad_out ? 8 : 0) + + STUB_INDEX_OFFSET; } /* Returns the offset for the base pointer used by the stub. */ @@ -2532,7 +2534,7 @@ class xlogue_layout { /* Lazy-inited cache of symbol names for stubs. */ char m_stub_names[XLOGUE_STUB_COUNT][VARIANT_COUNT][STUB_NAME_MAX_LEN]; - static const struct xlogue_layout GTY(()) s_instances[XLOGUE_SET_COUNT]; + static const struct GTY(()) xlogue_layout s_instances[XLOGUE_SET_COUNT]; Hmm, during development I originally had C-style xlogue_layout as a struct and later decided to make it a class and apparently forgot to remove the "struct" here. None the less, it's bazaar that the GTY() would go in between the "struct" and the "xlogue_layout." As I said before, I don't fully understand how this GTY works. Can we just remove the "struct" keyword? Also, if the way I had it was wrong, (and resulted in garbage collection not working right) then perhaps it was the cause of a problem I had with caching symbol rtx objects. I could not get this to work because my cached objects would somehow become stale and I've since removed that code (from xlogue_layout::get_stub_rtx). (i.e., does GTY effect lifespan of globals, TU statics and static C++ data members?) Yes, I have not noticed the "struct", and agree to remove it. I just saw every other place where GTY is used it is directly after "struct" or "static", so my impulse was just to follow that examples. Yeah, and not understanding how it worked I was just trying to follow suit. But neither version actually makes the class GC-able. Apparently this class construct is too complicated for the gengtype machinery. So I am inclined to remove the GTY keyword completely as it gives you only false security in GC's ability to garbage collect anything in this class. Th
Re: [PATCH] [i386] Recompute the frame layout less often
On 05/17/2017 01:39 PM, Bernd Edlinger wrote: On 05/15/17 03:39, Daniel Santos wrote: I should add that if you want to run faster tests just on the ms to sysv abi code, you can use make RUNTESTFLAGS="ms-sysv.exp" check and then if that succeeds run the full testsuite. Daniel Hmm, that's funny... If I use "make check-c RUNTESTFLAGS="ms-sysv.exp" -j8" it seems to work, but if I omit the -j8 it fails: make check-c RUNTESTFLAGS="ms-sysv.exp" ...Test Run By ed on Wed May 17 20:38:24 2017 Native configuration is x86_64-pc-linux-gnu === gcc tests === Schedule of variations: unix Running target unix Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target. Using /usr/share/dejagnu/config/unix.exp as generic interface file for target. Using /home/ed/gnu/gcc-trunk/gcc/testsuite/config/default.exp as tool-and-target-specific interface file. Running /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp ... ERROR: tcl error sourcing /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp. ERROR: no such variable (read trace on "env(GCC_RUNTEST_PARALLELIZE_DIR)") invoked from within "set parallel_dir "$env(GCC_RUNTEST_PARALLELIZE_DIR)/abi-ms-sysv"" (file "/home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp" line 154) invoked from within "source /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp" ("uplevel" body line 1) invoked from within "uplevel #0 source /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp" invoked from within "catch "uplevel #0 source $test_file_name"" === gcc Summary === /home/ed/gnu/gcc-build/gcc/xgcc version 8.0.0 20170514 (experimental) (GCC) make[2]: Leaving directory `/home/ed/gnu/gcc-build/gcc' make[1]: Leaving directory `/home/ed/gnu/gcc-build/gcc' Hmm, that might be something I hadn't actually tried. And if I run it in a directory where I had previously run a multi-job check it doesn't blow up (maybe because the directory is already there?) Due to the nature of my test program, I had to break with tradition and implement something akin to the test that generates random structs (I forgot what that one is called). It ended up breaking the bastardized parallelization scheme, so I had to implement my own re-bastardized scheme. Looks like I can just skip parallelization if GCC_RUNTEST_PARALLELIZE_DIR isn't defined. I have another Solaris test issue on PR 80759 so I'll fix that along with it. Thanks, Daniel PS: Oh! it might be due to the difference between -j1 and no -j argument.
Re: [PATCH] [i386] Recompute the frame layout less often
On 05/18/2017 08:37 AM, Bernd Edlinger wrote: On 05/17/17 04:01, Daniel Santos wrote: - if (ignore_outlined && cfun->machine->call_ms2sysv - && in_hard_reg_set_p (stub_managed_regs, DImode, regno)) -return false; + if (ignore_outlined && cfun->machine->call_ms2sysv) +{ + /* Registers who's save & restore will be managed by stubs called from + pro/epilogue. */ + HARD_REG_SET stub_managed_regs; + xlogue_layout::compute_stub_managed_regs (stub_managed_regs); + if (in_hard_reg_set_p (stub_managed_regs, DImode, regno)) +return false; +} + if (crtl->drap_reg && regno == REGNO (crtl->drap_reg) && !cfun->machine->no_drap_save_restore) This makes no sense. The entire purpose of stub_managed_regs is to cache the result of xlogue_layout::compute_stub_managed_regs() and this would unnecessarily repeat that calculation for each time ix86_save_reg() is called. Since xlogue_layout::compute_stub_managed_regs() calls ix86_save_reg many times, this makes it even worse.Which registers are being saved out-of-line and inline MUST be known at the time the stack layout is determined. So stub_managed_regsshould either be left a TU static or just moved to struct machine_function. As an aside, I've noticed that xlogue_layout::compute_stub_managed_regs is calling ix86_save_reg (which isn't trivial) more often than it really has to, so I've refactored it. Well, meanwhile I think the stub_managed_regs contain zero information and need not be saved at all, because it can easily be reconstructed from m->call_ms2sysv_extra_regs. See the attached new version. Daniel does it work for you? No, I'm not at all comfortable with you making so many seemingly unnecessary changes to this code. (Although I wish I got this much feedback during my RFCs! :) I can accept the changes to is/count_stub_managed_reg (with some caveats), but I do not at all see the rationale for changing m_stub_names to a static and adding another dimension for the object instance -- from an OO standpoint, that's just bad design. Can you please share your rationale for that? Incidentally, half of the space in that array is wasted and can be trimmed since a given instance of xlogue_layout either supports hard frame pointers or doesn't, I just never got around to splitting that apart. (The first three enum xlogue_stub values are for without HFP and the last three for with.) Also, if we wanted to further reduce the memory footprint of xlogue_layout objects, the offset field of struct reginfo could be changed to int, and if we really wanted to go nuts then 16 bits would work for both of its fields. So for is/count_stub_managed_reg(s), you are obviously much more familiar with gcc, its passes and the i386 backend, but my knowledge level makes me very uncomfortable having the result of xlogue_layout::is_stub_managed_reg() determined in a way that has the (apparent) potential to differ from from the state at the time the last call to ix86_compute_frame_layout() was made; I just don't understand well enough what all can change in between the last call to ix86_compute_frame_layout() and the last call to xlogue_layout::is_stub_managed_reg(). I like your count_stub_managed_regs() is_stub_managed_regs() from a *performance* standpoint (and I know I get too uptight about that kind of thing, so appreciate that), but as to the change in scheme, I would have to trust you if you assert that this will always behave consistently. I also want to give you a little background on some of these seemingly repetitive computations. One of my design goals was for the code to be relatively easily to adapted to the management of out-of-line pro/epilogue stubs for other possible scenarios where there are a lot of clobbers and it could be useful. Granted, I don't have enough knowledge of x86 architectures to identify situations other than this one (in 64-bit Wine) where it could be helpful and I know that x86 push/pops are really small. So theoretically, struct machine_function's "call_ms2sysv" could be changed to something like "outline_savres" and any combination of clobbered registers for which there is a descent stub could be used if it was a good choice. I also realize that nobody likes complexity that isn't being used, and I respect that. So if you are comfortable with this change and you believe you understand how it works then I will agree to it, but I'll be trusting you well beyond my knowledge level and ability to confidently predict the outcome (probably what a programmer hates the most). Thanks, Daniel
Re: [PATCH] [i386] Recompute the frame layout less often
PS: Oh! it might be due to the difference between -j1 and no -j argument. Yes, that's how I missed it. This flaw isn't exposed with make -j1, but is exposed with just make. Thanks for finding this! Daniel
[PATCH 0/2] [testsuite] PR80759 Fix test breakages on i386-pc-solaris2.*
There are a few issues with my ms-sysv.exp tests: 1. Use of gas extensions in do_test.S cause breakages on Solaris, 2. Parallelization breaks when no make -j flag is passed, 3. Builds aren't adding TEST_ALWAYS_FLAGS, so log files filled with color escape codes, and 4. The "test unsupported" message is being spammed once for each -j https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80759 I've broken this apart into two patches because I don't know if you'll agree with the first one. I fixed the make -j issue and moved the parallelization code into a new gcc/target/lib/parallelize.exp in the first patch and fixed all of the other issues in the second. I've removed all usage of gas .struct in my assembly file, used hard-coded the offsets into the code and added asserts to main.c to make sure they don't change. I've bootstrapped and retested on x86_64 Linux and have asked Rainer to retest on Solaris. Presuming that succeeds, are you OK with this change? (I have SVN write privs now, so I can even commit it myself). Thanks, Daniel
[PATCH 2/2] [testsuite] PR 80759 Remove gas extensions from do-test.S, fix other problems
Use of .struct in do_test.S causes breakages when gas isn't the assembler (e.g., Solaris). I also wasn't including TEST_ALWAYS_FLAGS in my CFLAGS resulting in super-ugly log files. Finally, this patch eliminates spam of "test unsupported" (limiting it to one printing). Signed-off-by: Daniel Santos --- .../gcc.target/x86_64/abi/ms-sysv/do-test.S| 26 +- .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.c| 7 ++ .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp | 24 3 files changed, 27 insertions(+), 30 deletions(-) diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S index 1395235fd1e..967eb959fbc 100644 --- a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S +++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S @@ -46,22 +46,6 @@ fn: # define MOVAPS movaps # endif -/* TODO: Is there a cleaner way to provide these offsets? */ - .struct 0 -test_data_save: - - .struct test_data_save + 224 -test_data_input: - - .struct test_data_save + 448 -test_data_output: - - .struct test_data_save + 672 -test_data_fn: - - .struct test_data_save + 680 -test_data_retaddr: - .text regs_to_mem: @@ -132,23 +116,23 @@ L0: callregs_to_mem # Load register with random data - lea test_data + test_data_input(%rip), %rax + lea test_data + 224(%rip), %rax callmem_to_regs # Save original return address pop %rax - movq%rax, test_data + test_data_retaddr(%rip) + movq%rax, test_data + 680(%rip) # Call the test function - call*test_data + test_data_fn(%rip) + call*test_data + 672(%rip) # Restore the original return address - movqtest_data + test_data_retaddr(%rip), %rcx + movqtest_data + 680(%rip), %rcx push%rcx # Save test function return value and store resulting register values push%rax - lea test_data + test_data_output(%rip), %rax + lea test_data + 448(%rip), %rax callregs_to_mem # Restore registers diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c index 2a011f5103d..7cec312c386 100644 --- a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c +++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c @@ -346,6 +346,13 @@ int main (int argc, char *argv[]) assert (!((long)&test_data.regdata[REG_SET_INPUT] & 15)); assert (!((long)&test_data.regdata[REG_SET_OUTPUT] & 15)); + /* Verify offsets hard-coded into assembly. */ + assert (__builtin_offsetof (struct test_data, regdata[REG_SET_SAVE]) == 0); + assert (__builtin_offsetof (struct test_data, regdata[REG_SET_INPUT]) == 224); + assert (__builtin_offsetof (struct test_data, regdata[REG_SET_OUTPUT]) == 448); + assert (__builtin_offsetof (struct test_data, fn) == 672); + assert (__builtin_offsetof (struct test_data, retaddr) == 680); + while ((c = getopt (argc, argv, "s:f")) != -1) { switch (c) diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp index 77c40dbf349..a9571f194b1 100644 --- a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp +++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp @@ -23,18 +23,12 @@ # see the files COPYING3 and COPYING.RUNTIME respectively. If not, see # <http://www.gnu.org/licenses/>. -# Exit immediately if this isn't a native x86_64 target. -if { (![istarget x86_64-*-*] && ![istarget i?86-*-*]) - || ![is-effective-target lp64] || ![isnative] } then { -unsupported "$subdir" -return -} - load_lib gcc-dg.exp load_lib parallelize.exp proc runtest_ms_sysv { cflags generator_args } { -global GCC_UNDER_TEST HOSTCXX HOSTCXXFLAGS tmpdir srcdir subdir +global GCC_UNDER_TEST HOSTCXX HOSTCXXFLAGS tmpdir srcdir subdir \ + TEST_ALWAYS_FLAGS set objdir "$tmpdir/ms-sysv" set generator "$tmpdir/ms-sysv-generate.exe" @@ -93,7 +87,7 @@ proc runtest_ms_sysv { cflags generator_args } { } } -set cc "$GCC_UNDER_TEST -I$objdir -I$srcdir/$subdir $cflags $warn_flags" +set cc "$GCC_UNDER_TEST -I$objdir -I$srcdir/$subdir $TEST_ALWAYS_FLAGS $cflags $warn_flags" # Assemble do-test.S set src "$srcdir/$subdir/do-test.S" @@ -142,6 +136,18 @@ if { [parallel-init "ms2sysv"] != 0 } then { return; } +# Exit if this isn't a native x86_64 target. +if { (![istarget x86_64-*-*] && ![istarget i?86-*-*]) + || ![is-effective-target lp64] || ![isnative] } then { + +# The first call to parallel-should-run-test is used so we only print the +#
[PATCH 1/2] [testsuite] Move non-standard parallelization support into new lib and fix flaw
This fixes a flaw in my parallelization code that caused it to fail when GCC_RUNTEST_PARALLELIZE_DIR wasn't set. It worked fine with make -j1, but failed with just make. As there could be other tests that might need to do their own paralellization, I'm moving the that code into it's own file under gcc/testsuite/lib. Signed-off-by: Daniel Santos --- .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp | 48 gcc/testsuite/lib/parallelize.exp | 88 ++ 2 files changed, 103 insertions(+), 33 deletions(-) create mode 100644 gcc/testsuite/lib/parallelize.exp diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp index e317af9bd85..77c40dbf349 100644 --- a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp +++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp @@ -30,13 +30,11 @@ if { (![istarget x86_64-*-*] && ![istarget i?86-*-*]) return } -global GCC_RUNTEST_PARALLELIZE_DIR - load_lib gcc-dg.exp +load_lib parallelize.exp proc runtest_ms_sysv { cflags generator_args } { -global GCC_UNDER_TEST HOSTCXX HOSTCXXFLAGS tmpdir srcdir subdir \ - parallel_dir next_test +global GCC_UNDER_TEST HOSTCXX HOSTCXXFLAGS tmpdir srcdir subdir set objdir "$tmpdir/ms-sysv" set generator "$tmpdir/ms-sysv-generate.exe" @@ -46,22 +44,6 @@ proc runtest_ms_sysv { cflags generator_args } { set ms_sysv_exe "$objdir/ms-sysv.exe" set status 0 set warn_flags "-Wall" -set this_test $next_test -incr next_test - -# Do parallelization here -if [catch {set fd [open "$parallel_dir/$this_test" \ - [list RDWR CREAT EXCL]]} ] { - if { [lindex $::errorCode 1] eq "EEXIST" } then { - # Another job is running this test - return - } else { - error "Failed to open $parallel_dir/$this_test: $::errorCode" - set status 1 - } -} else { - close $fd -} # Detect when hard frame pointers are enabled (or required) so we know not # to generate bp clobbers. @@ -73,9 +55,17 @@ proc runtest_ms_sysv { cflags generator_args } { set descr "$subdir CFLAGS=\"$cflags\" generator_args=\"$generator_args\"" verbose "$tmpdir: Running test $descr" 1 -# Cleanup any previous test in objdir -file delete -force $objdir -file mkdir $objdir +set status [parallel-should-run-test] + +if { $status == 1 } then { + return +} + +if { $status == 0 } then { + # Cleanup any previous test in objdir + file delete -force $objdir + file mkdir $objdir +} # Build the generator (only needs to be done once). set src "$srcdir/$subdir/gen.cc" @@ -148,16 +138,8 @@ proc runtest_ms_sysv { cflags generator_args } { } dg-init - -# Setup parallelization -set next_test 0 -set parallel_dir "$env(GCC_RUNTEST_PARALLELIZE_DIR)/abi-ms-sysv" -file mkdir "$env(GCC_RUNTEST_PARALLELIZE_DIR)" -file mkdir "$parallel_dir" - -if { ![file isdirectory "$parallel_dir"] } then { -error "Failed to create directory $parallel_dir: $::errorCode" -return +if { [parallel-init "ms2sysv"] != 0 } then { +return; } set gen_opts "-p0-5" diff --git a/gcc/testsuite/lib/parallelize.exp b/gcc/testsuite/lib/parallelize.exp new file mode 100644 index 000..346a06f0fa0 --- /dev/null +++ b/gcc/testsuite/lib/parallelize.exp @@ -0,0 +1,88 @@ +# Functions for parallelizing tests that cannot use the standard dg-run, +# dg-runtest or gcc-dg-runtest for some reason. +# +# Copyright (C) 2017 Free Software Foundation, Inc. +# Contributed by Daniel Santos +# +# This file is part of GCC. +# +# GCC is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3, or (at your option) +# any later version. +# +# GCC is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# Under Section 7 of GPL version 3, you are granted additional +# permissions described in the GCC Runtime Library Exception, version +# 3.1, as published by the Free Software Foundation. +# +# You should have received a copy of the GNU General Public License and +# a copy of the GCC Runtime Library Exception along with this program; +# see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +# <http://www.gnu.org/licenses/>. + +set is_parallel_build 0 +set parallel_next_test 0 +set parallel_dir "" + +# Setup parallelization directory and variabl
Re: [PATCH 2/2] [testsuite] PR 80759 Remove gas extensions from do-test.S, fix other problems
Thanks you for your assistance Rainer! On 05/19/2017 04:03 AM, Rainer Orth wrote: unfortunately, it still doesn't, as explained in the PR. The multilib support is still wrong/non-existant. I guess I thought for some reason that would magically appear in TEST_ALWAYS_FLAGS. :) I've explicitly added it for now, but I haven't yet found where -m64 gets fed in the normal flow of things and I would rather know I'm doing things as closely as possible to how the rest if the test harness does it. (I have SVN write privs now, so I can even commit it myself). Please always include ChangeLog entries with your patch submissions so one can easily see what you've change (cf. https://gcc.gnu.org/contribute.html). Thanks. Rainer I hate when I forget that! I'll be sure to remember when I resubmit. Use of .struct in do_test.S causes breakages when gas isn't the assembler (e.g., Solaris). I also wasn't including TEST_ALWAYS_FLAGS in my CFLAGS resulting in super-ugly log files. Finally, this patch eliminates spam of "test unsupported" (limiting it to one printing). Signed-off-by: Daniel Santos --- .../gcc.target/x86_64/abi/ms-sysv/do-test.S| 26 +- .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.c| 7 ++ .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp | 24 3 files changed, 27 insertions(+), 30 deletions(-) diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S index 1395235fd1e..967eb959fbc 100644 --- a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S +++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S @@ -46,22 +46,6 @@ fn: # define MOVAPS movaps # endif -/* TODO: Is there a cleaner way to provide these offsets? */ - .struct 0 -test_data_save: - - .struct test_data_save + 224 -test_data_input: - - .struct test_data_save + 448 -test_data_output: - - .struct test_data_save + 672 -test_data_fn: - - .struct test_data_save + 680 -test_data_retaddr: - .text regs_to_mem: @@ -132,23 +116,23 @@ L0: callregs_to_mem # Load register with random data - lea test_data + test_data_input(%rip), %rax + lea test_data + 224(%rip), %rax callmem_to_regs # Save original return address pop %rax - movq%rax, test_data + test_data_retaddr(%rip) + movq%rax, test_data + 680(%rip) # Call the test function - call*test_data + test_data_fn(%rip) + call*test_data + 672(%rip) # Restore the original return address - movqtest_data + test_data_retaddr(%rip), %rcx + movqtest_data + 680(%rip), %rcx push%rcx # Save test function return value and store resulting register values push%rax - lea test_data + test_data_output(%rip), %rax + lea test_data + 448(%rip), %rax callregs_to_mem # Restore registers diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c index 2a011f5103d..7cec312c386 100644 --- a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c +++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c @@ -346,6 +346,13 @@ int main (int argc, char *argv[]) assert (!((long)&test_data.regdata[REG_SET_INPUT] & 15)); assert (!((long)&test_data.regdata[REG_SET_OUTPUT] & 15)); + /* Verify offsets hard-coded into assembly. */ + assert (__builtin_offsetof (struct test_data, regdata[REG_SET_SAVE]) == 0); + assert (__builtin_offsetof (struct test_data, regdata[REG_SET_INPUT]) == 224); + assert (__builtin_offsetof (struct test_data, regdata[REG_SET_OUTPUT]) == 448); + assert (__builtin_offsetof (struct test_data, fn) == 672); + assert (__builtin_offsetof (struct test_data, retaddr) == 680); + while ((c = getopt (argc, argv, "s:f")) != -1) { switch (c) while .struct is a gas extension and doesn't work with the Solaris/x86 /bin/as, having the same (mostly unexplained) constants hardcoded in two places isn't exactly helpful. I'd suggest moving them to (say) ms-sysv.h and include that from both do-test.S (which is preprocessed assembler after all) and ms-sysv.c. Rainer Well, I don't have an ms-sysv.h, but I suppose I can add one. I'm starting to lean more towards the idea of plucking out the portion of asm that uses these offsets, moving that to an inline asm function and having the code in do-test.S just jmp to it. I wish there was some sort of "naked" attribute for x86 since I'm not well versed in every way that the compiler can change it in a way that wouldn't be friendly. void __attribute__((optimize ("-O0 -fno-split-stack"))) do_test_body (void) { __asm__ __volatile__ (
Re: [PATCH] [i386] Recompute the frame layout less often
On 05/22/2017 01:32 PM, Bernd Edlinger wrote: On 05/19/17 05:17, Daniel Santos wrote: No, I'm not at all comfortable with you making so many seemingly unnecessary changes to this code. (Although I wish I got this much feedback during my RFCs! :) I can accept the changes to is/count_stub_managed_reg (with some caveats), but I do not at all see the rationale for changing m_stub_names to a static and adding another dimension for the object instance -- from an OO standpoint, that's just bad design. Can you please share your rationale for that? Hmm, sorry about that ... I just thought it would be nice to avoid the const-cast here. Well remember const-correctness isn't about an object's internal (bitwise) state, but it's externally visible (logical) state. So a const member function need not avoid altering it's internal state if the externally visible state remains unchanged, such as when caching some result or lazy initing. I have tended to prefer using const_cast for this, isolating its use to a single const accessor function (or if () block) to leave less room for the data members to be accidentally altered in another const member function. But mutable is generally preferred over const_cast, which opens up the danger of accidentally modifying an object's logical state (especially by a subsequent edit), so using mutable is probably a better practice anyway. However, ... This moved the m_stub_names from all 4 instances to one static array s_stub_names. But looking at it again, I think the extra dimension is not even necessary, because all instances share the same data, so removing that extra dimension again will be fine. You are correct! And I see that you're new patch has already changed get_stub_name to a static member function, so great! Incidentally, half of the space in that array is wasted and can be trimmed since a given instance of xlogue_layout either supports hard frame pointers or doesn't, I just never got around to splitting that apart. (The first three enum xlogue_stub values are for without HFP and the last three for with.) Also, if we wanted to further reduce the memory footprint of xlogue_layout objects, the offset field of struct reginfo could be changed to int, and if we really wanted to go nuts then 16 bits would work for both of its fields. So for is/count_stub_managed_reg(s), you are obviously much more familiar with gcc, its passes and the i386 backend, but my knowledge level makes me very uncomfortable having the result of xlogue_layout::is_stub_managed_reg() determined in a way that has the (apparent) potential to differ from from the state at the time the last call to ix86_compute_frame_layout() was made; I just don't understand I fund it technically difficult to add a HARD_REG_SET to struct machine_function, and tried to avoid the extra overhead of calling ix86_save_reg so often, which you also felt uncomfortable with. So, if you look at compute_stub_managed_regs I first saw that the first loop can never terminate thru the "return 0", because the registers in x86_64_ms_sysv_extra_clobbered_registers are guaranteed to be clobbered. Then I saw that the bits in stub_managed_regs are always added in the same sequence, thus the result depends only on the number call_ms2sysv_extra_regs and hfp so everything is already available in struct machine_function. Thanks Bernd. Yes, I agree with how you have refactored compute_stub_managed_regs given your rationale of not adding another header dependency to i386.h. It is only the overall scheme of calculating this outside of ix86_compute_frame_layout that I cannot validate due to my not having a good understanding of what can and cannot change in between the time that ix86_compute_frame_layout is last called and the last call to is_stub_managed_regs(). As Uros said, my patch set touches a "delicate part of the compiler, where lots of code-paths cross each other (and we have had quite some hard-to-fix bugs in this area)" (https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01924.html). I wrote it the way I did with my understanding of what was safe to do and your alterations move it's functionality outside of that understanding. So if you say that this won't break it, then I will have to trust you (and the testsuite) on that. On that note, the tests are undergoing some change and bug fixes and I'm planning on adding more tests to validate non-breakage with various other stack frame-related options and probably additional tests (and test options) triggered by GCC_TEST_RUN_EXPENSIVE or some such. Daniel
Re: [PATCH] [i386] Recompute the frame layout less often
On 05/23/2017 09:31 AM, Bernd Edlinger wrote: Hi, this is the latest version of my patch. As already said, it attempts to compute the frame layout only when relevant data have changed. Apologies for doing more clean-up on Daniel's patch than absolutely necessary, but ... Bootstrap and reg-tested successfully on x86_64-pc-linux-gnu with unix\{,-m32\}. Is it OK for trunk? Thanks Bernd. OK with me. Thanks, Daniel
Use aligned SSE movs for re-aligned MS ABI pro/epilogues
According to the Microsoft 64-bit ABI specification, registers RDI, RSI and XMM6-15 are non-volatile and the stack alignment is 16 bytes. In practice, the Windows implementation appears to not be so picky about the 16-byte alignment requirement, probably because it never to save SSE registers and instead just never uses them. This led to a large list (https://bugs.winehq.org/show_bug.cgi?id=27680) of Win64 programs violating the ABI with impunity, but crashing in Wine until force_align_arg_pointer was added to gcc and used in Wine. Stack re-alignment was originally done prior to int register saves, but was moved to after SSE saves in 2010 to better facilitate parallelization, and for simplicity's sake, the stack pointer was considered invalid after stack re-alignment and SSE movs were emitted unaligned relative to the frame pointer. But now that forced stack re-alignment is the new normal for Wine64, it means that it always gets the unaligned movs in Wine. This patch set fixes the problem while preserving the improved parallelization of int register saves of Richard Henderson's patch in 2010. This patchset is a prerequisite to another I'm still refining that out-of-lines these pro/epilogues. I'm still pretty new to this project, so I hope I haven't missed anything. (No additional failures in tests.) Daniel Santos 2016-12-21 Daniel Santos * config/i386/i386.h (struct machine_frame_state): New fields sp_realigned and sp_realigned_offset. * config/i386/i386.c (struct ix86_frame): New fields stack_realign_allocate_offset and stack_realign_offset. (ix86_compute_frame_layout): Modify re-alignment calculations. (sp_valid_at, fp_valid_at): New inline functions. (choose_basereg): New function. (choose_baseaddr): Add align parameter, use choose_basereg and modify all callers. (ix86_emit_save_reg_using_mov, ix86_emit_restore_sse_regs_using_mov): Use align parameter of choose_baseaddr to generated aligned SSE movs when possible. (pro_epilogue_adjust_stack): Modify to track machine_frame_state::sp_realigned. (ix86_expand_prologue): Modify stack re-alignment code. (ix86_emit_leave): Clear machine_frame_state::sp_realigned. (ix86_expand_epilogue): Modify validity checks of frame and stack pointers.
[PATCH 2/3] [i386] Keep stack pointer valid after after re-alignment.
This stage adds the fields sp_realigned and sp_realigned_offset to struct machine_frame_state and adds the concept of the stack pointer being re-aligned rather than invalid. The inline functions sp_valid_at and fp_valid_at are added to test if a given location relative to the CFA can be accessed with the stack or frame pointer, respectively. Stack allocation prior to re-alignment is modified so that we allocate what is needed, but don't allocate unneeded space in the event that no SSE registers are saved, but frame.sse_reg_save_offset is increased for alignment. As this change only alters how SSE registers are saved, moving the re-alignment AND should not hinder parallelization of int register saves. --- gcc/config/i386/i386.c | 69 -- gcc/config/i386/i386.h | 12 + 2 files changed, 62 insertions(+), 19 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 7f7389cbe31..b5f9f36094f 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12604,6 +12604,24 @@ choose_baseaddr_len (unsigned int regno, HOST_WIDE_INT offset) return len; } +/* Determine if the stack pointer is valid for accessing the cfa_offset. */ + +static inline bool sp_valid_at (HOST_WIDE_INT cfa_offset) +{ + const struct machine_frame_state &fs = cfun->machine->fs; + return fs.sp_valid && !(fs.sp_realigned + && cfa_offset < fs.sp_realigned_offset); +} + +/* Determine if the frame pointer is valid for accessing the cfa_offset. */ + +static inline bool fp_valid_at (HOST_WIDE_INT cfa_offset) +{ + const struct machine_frame_state &fs = cfun->machine->fs; + return fs.fp_valid && !(fs.sp_valid && fs.sp_realigned + && cfa_offset >= fs.sp_realigned_offset); +} + /* Return an RTX that points to CFA_OFFSET within the stack frame. The valid base registers are taken from CFUN->MACHINE->FS. */ @@ -12902,15 +12920,18 @@ pro_epilogue_adjust_stack (rtx dest, rtx src, rtx offset, { HOST_WIDE_INT ooffset = m->fs.sp_offset; bool valid = m->fs.sp_valid; + bool realigned = m->fs.sp_realigned; if (src == hard_frame_pointer_rtx) { valid = m->fs.fp_valid; + realigned = false; ooffset = m->fs.fp_offset; } else if (src == crtl->drap_reg) { valid = m->fs.drap_valid; + realigned = false; ooffset = 0; } else @@ -12924,6 +12945,7 @@ pro_epilogue_adjust_stack (rtx dest, rtx src, rtx offset, m->fs.sp_offset = ooffset - INTVAL (offset); m->fs.sp_valid = valid; + m->fs.sp_realigned = realigned; } } @@ -13673,6 +13695,7 @@ ix86_expand_prologue (void) this is fudged; we're interested to offsets within the local frame. */ m->fs.sp_offset = INCOMING_FRAME_SP_OFFSET; m->fs.sp_valid = true; + m->fs.sp_realigned = false; ix86_compute_frame_layout (&frame); @@ -13889,11 +13912,10 @@ ix86_expand_prologue (void) that we must allocate the size of the register save area before performing the actual alignment. Otherwise we cannot guarantee that there's enough storage above the realignment point. */ - if (m->fs.sp_offset != frame.sse_reg_save_offset) + allocate = frame.stack_realign_allocate_offset - m->fs.sp_offset; + if (allocate) pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, - GEN_INT (m->fs.sp_offset - - frame.sse_reg_save_offset), - -1, false); + GEN_INT (-allocate), -1, false); /* Align the stack. */ insn = emit_insn (ix86_gen_andsp (stack_pointer_rtx, @@ -13901,11 +13923,14 @@ ix86_expand_prologue (void) GEN_INT (-align_bytes))); /* For the purposes of register save area addressing, the stack - pointer is no longer valid. As for the value of sp_offset, -see ix86_compute_frame_layout, which we need to match in order -to pass verification of stack_pointer_offset at the end. */ +pointer can no longer be used to access anything in the frame +below m->fs.sp_realigned_offset and the frame pointer cannot be +used for anything at or above. */ + gcc_assert (m->fs.sp_offset == frame.stack_realign_allocate_offset); m->fs.sp_offset = ROUND_UP (m->fs.sp_offset, align_bytes); - m->fs.sp_valid = false; + m->fs.sp_realigned = true; + m->fs.sp_realigned_offset = m->fs.sp_offset - frame.nsseregs * 16; + gcc_assert (m->fs.sp_realigned_offset == frame.stack_realign_offset); } allocate = frame.stack_pointer_offset - m->fs.sp_offset; @@ -14244,6 +14269,7 @@ ix86_emit_leave (void) gcc_assert (m->fs.fp_valid); m->fs.sp_valid = true; + m->fs.sp_realigned = false; m->fs.sp_offset =
[PATCH 1/3] [i386] Move stack frame re-alignment to before SSE saves.
This step adds new fields to struct ix86_frame to track where we started the stack re-alignment and what we need to allocate prior to re-alignment. In ix86_compute_frame_layout, we do the stack frame re-alignment computation prior to computing the SSE save area so that it we have an aligned SSE save area. This new also assures that the SSE save area is properly aligned when DRAP is used. --- gcc/config/i386/i386.c | 40 +--- 1 file changed, 25 insertions(+), 15 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 792e8ec232d..7f7389cbe31 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2453,7 +2453,7 @@ struct GTY(()) stack_local_entry { [saved regs] <- regs_save_offset [padding0] - + <- stack_realign_offset [saved SSE regs] <- sse_regs_save_offset [padding1] | @@ -2479,6 +2479,8 @@ struct ix86_frame HOST_WIDE_INT stack_pointer_offset; HOST_WIDE_INT hfp_save_offset; HOST_WIDE_INT reg_save_offset; + HOST_WIDE_INT stack_realign_allocate_offset; + HOST_WIDE_INT stack_realign_offset; HOST_WIDE_INT sse_reg_save_offset; /* When save_regs_using_mov is set, emit prologue using @@ -12457,28 +12459,36 @@ ix86_compute_frame_layout (struct ix86_frame *frame) if (TARGET_SEH) frame->hard_frame_pointer_offset = offset; + /* When re-aligning the stack frame, but not saving SSE registers, this + is the offset we want to allocate memory for. */ + frame->stack_realign_allocate_offset = offset; + + /* The re-aligned stack starts here. Values before this point are not + directly comparable with values below this point. Use sp_valid_at + to determine if the stack pointer is valid for a given offset and + fp_valid_at for the frame pointer. */ + if (stack_realign_fp) +offset = ROUND_UP (offset, stack_alignment_needed); + frame->stack_realign_offset = offset; + /* Align and set SSE register save area. */ if (frame->nsseregs) { /* The only ABI that has saved SSE registers (Win64) also has a -16-byte aligned default stack, and thus we don't need to be -within the re-aligned local stack frame to save them. In case -incoming stack boundary is aligned to less than 16 bytes, -unaligned move of SSE register will be emitted, so there is -no point to round up the SSE register save area outside the -re-aligned local stack frame to 16 bytes. */ - if (ix86_incoming_stack_boundary >= 128) +16-byte aligned default stack. However, many programs violate +the ABI, and Wine64 forces stack realignment to compensate. + +If the incoming stack boundary is at least 16 bytes, or DRAP is +required and the DRAP re-alignment boundary is at least 16 bytes, +then we want the SSE register save area properly aligned. */ + if (ix86_incoming_stack_boundary >= 128 + || (stack_realign_drap && stack_alignment_needed >= 16)) offset = ROUND_UP (offset, 16); offset += frame->nsseregs * 16; + frame->stack_realign_allocate_offset = offset; } - frame->sse_reg_save_offset = offset; - /* The re-aligned stack starts here. Values before this point are not - directly comparable with values below this point. In order to make - sure that no value happens to be the same before and after, force - the alignment computation below to add a non-zero value. */ - if (stack_realign_fp) -offset = ROUND_UP (offset, stack_alignment_needed); + frame->sse_reg_save_offset = offset; /* Va-arg area */ frame->va_arg_size = ix86_varargs_gpr_size + ix86_varargs_fpr_size; -- 2.11.0
[PATCH 3/3] [i386] Use re-aligned stack pointer for aligned SSE movs
This adds an optional `align' parameter to choose_baseaddr allowing the caller to request an address that is aligned to some boundary. Then ix86_emit_save_regs_using_mov and ix86_emit_restore_regs_using_mov are modified so that optimally aligned memory is used when such a base register is available. --- gcc/config/i386/i386.c | 110 ++--- 1 file changed, 87 insertions(+), 23 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index b5f9f36094f..e60267a903d 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12622,15 +12622,40 @@ static inline bool fp_valid_at (HOST_WIDE_INT cfa_offset) && cfa_offset >= fs.sp_realigned_offset); } -/* Return an RTX that points to CFA_OFFSET within the stack frame. - The valid base registers are taken from CFUN->MACHINE->FS. */ +/* Choose a base register based upon alignment requested, speed and/or + size. */ -static rtx -choose_baseaddr (HOST_WIDE_INT cfa_offset) +static void choose_basereg (HOST_WIDE_INT cfa_offset, rtx &base_reg, + HOST_WIDE_INT &base_offset, + unsigned int align_reqested, unsigned int *align) { const struct machine_function *m = cfun->machine; - rtx base_reg = NULL; - HOST_WIDE_INT base_offset = 0; + unsigned int hfp_align; + unsigned int drap_align; + unsigned int sp_align; + bool hfp_ok = fp_valid_at (cfa_offset); + bool drap_ok = m->fs.drap_valid; + bool sp_ok = sp_valid_at (cfa_offset); + + hfp_align = drap_align = sp_align = INCOMING_STACK_BOUNDARY; + + /* Filter out any registers that don't meet the requested alignment + criteria. */ + if (align_reqested) +{ + /* Make sure we weren't given a cfa_offset incongruent with the +align_reqested. */ + gcc_assert (!(cfa_offset & (align_reqested / BITS_PER_UNIT - 1))); + + if (m->fs.realigned) + hfp_align = drap_align = sp_align = crtl->stack_alignment_needed; + else if (m->fs.sp_realigned) + sp_align = crtl->stack_alignment_needed; + + hfp_ok = hfp_ok && hfp_align >= align_reqested; + drap_ok = drap_ok && drap_align >= align_reqested; + sp_ok = sp_ok && sp_align >= align_reqested; +} if (m->use_fast_prologue_epilogue) { @@ -12639,17 +12664,17 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset) while DRAP must be reloaded within the epilogue. But choose either over the SP due to increased encoding size. */ - if (m->fs.fp_valid) + if (hfp_ok) { base_reg = hard_frame_pointer_rtx; base_offset = m->fs.fp_offset - cfa_offset; } - else if (m->fs.drap_valid) + else if (drap_ok) { base_reg = crtl->drap_reg; base_offset = 0 - cfa_offset; } - else if (m->fs.sp_valid) + else if (sp_ok) { base_reg = stack_pointer_rtx; base_offset = m->fs.sp_offset - cfa_offset; @@ -12662,13 +12687,13 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset) /* Choose the base register with the smallest address encoding. With a tie, choose FP > DRAP > SP. */ - if (m->fs.sp_valid) + if (sp_ok) { base_reg = stack_pointer_rtx; base_offset = m->fs.sp_offset - cfa_offset; len = choose_baseaddr_len (STACK_POINTER_REGNUM, base_offset); } - if (m->fs.drap_valid) + if (drap_ok) { toffset = 0 - cfa_offset; tlen = choose_baseaddr_len (REGNO (crtl->drap_reg), toffset); @@ -12679,7 +12704,7 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset) len = tlen; } } - if (m->fs.fp_valid) + if (hfp_ok) { toffset = m->fs.fp_offset - cfa_offset; tlen = choose_baseaddr_len (HARD_FRAME_POINTER_REGNUM, toffset); @@ -12691,8 +12716,40 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset) } } } - gcc_assert (base_reg != NULL); +/* Set the align return value. */ +if (align) + { + if (base_reg == stack_pointer_rtx) + *align = sp_align; + else if (base_reg == crtl->drap_reg) + *align = drap_align; + else if (base_reg == hard_frame_pointer_rtx) + *align = hfp_align; + } +} + +/* Return an RTX that points to CFA_OFFSET within the stack frame and + the alignment of address. If align is non-null, it should point to + an alignment value (in bits) that is preferred or zero and will + recieve the alignment of the base register that was selected. The + valid base registers are taken from CFUN->MACHINE->FS. */ + +static rtx +choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align) +{ + rtx base_reg = NULL; + HOST_WIDE_INT base_offset = 0; + + /* If a specific alignment is requested, try to get a base register + with that alignment first. */ + if (align && *align) +choose_basereg (cfa_offset
Re: Use aligned SSE movs for re-aligned MS ABI pro/epilogues
On 12/27/2016 07:56 AM, Uros Bizjak wrote: Hello! According to the Microsoft 64-bit ABI specification, registers RDI, RSI and XMM6-15 are non-volatile and the stack alignment is 16 bytes. In practice, the Windows implementation appears to not be so picky about the 16-byte alignment requirement, probably because it never to save SSE registers and instead just never uses them. This led to a large list (https://bugs.winehq.org/show_bug.cgi?id=27680) of Win64 programs violating the ABI with impunity, but crashing in Wine until force_align_arg_pointer was added to gcc and used in Wine. Stack re-alignment was originally done prior to int register saves, but was moved to after SSE saves in 2010 to better facilitate parallelization, and for simplicity's sake, the stack pointer was considered invalid after stack re-alignment and SSE movs were emitted unaligned relative to the frame pointer. But now that forced stack re-alignment is the new normal for Wine64, it means that it always gets the unaligned movs in Wine. This patch set fixes the problem while preserving the improved parallelization of int register saves of Richard Henderson's patch in 2010. I have looked briefly through the patchset, and the approach looks good to me. However, this patch is touching somehow delicate part of the compiler, where lots of code-paths cross each other (and we have had quite some hard-to-fix bugs in this area). IMO, the patch is not appropriate for inclusion at the current stage of the compiler development, and should wait for early stage 1. Please resubmit it later for inclusion. Thanks, Uros. Thank you for the review. Yes, this is a very delicate code path indeed. For the purposes of the 64-bit Microsoft ABI function calling a System V function, I've written a fairly exhaustive test program (although probably not fully comprehensive) for testing pro/epilogues under various conditions. I'm completely new to dejagnu however, so I still need to figure out how to properly integrate it. Maybe when I re-submit this patch set I can submit the new tests with it. Thanks, Daniel
Re: [RFC] [PATCH v3 0/8] [i386] Use out-of-line stubs for ms_abi pro/epilogues
I have finally completed all tests for Cygwin and MinGW both 32- & 64-bit with no additional test failures. There are still 567 tests failing both pre- and post-patch with error "error while loading shared libraries: cyggfortran-4.dll: cannot open shared object file: No such file or directory" in all 32-bit tests even after my (fairly crude) patch to address that problem. So as a separate issue, I don't yet have a clean patch set to resolve the windows dll search path issue (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79867). I had to change the test program, as I was dependent upon XSI extensions which aren't available on Cygwin, so I'll need to repost that. Also, I had to make one small change in the "aligned SSE MOVs" patch, disabling it on SEH targets since gcc/config/i386/winnt.c does not currently support the REG_CFA_EXPRESSION note in its unwind emit. This optimization primarily targets 64-bit Wine anyway, where stack realignment is now required. Daniel
Re: [PATCH 2/8] [i386] Add option -moutline-msabi-xlogues
Uros, can I please get your opinion on this? I have no objections to this, but I want to check with you first. On 02/10/2017 10:54 AM, Sandra Loosemore wrote: I'd like to re-iterate my previous request that the option be renamed -mno-inline-msabi-xlogues. No other option that controls inlining uses "outline" for the negative (disabling inlining). We have way too many options and the least we can do is try to make them use consistent conventions. -Sandra So the default would be -minline-msabi-xlogues and -mno-inline-msabi-xlogues would enable this optimization. Thanks, Daniel
Re: [RFC] [PATCH] [i386] Test program for ms_abi to sysv_abi function calls
I've had to make changes to the test program, as I was using XSI extensions which aren't implemented on Cygwin. But before I post the new patch, I noticed that it may be in the wrong directory. There is a gcc/testsuite/gcc.target/x86_64/abi directory and even a callabi subdirectory of that. For taxonomic accuracy, I would say it probably belongs as a subdirectory of .../abi or .../abi/callabi and renamed from "msabi" to "ms_sysv". Any objections? (It is currently in gcc/testsuite/gcc.target/i386/msabi.) Thanks, Daniel
[testsuite] Fix loading wrong DLLs on Windows, merge duplicate target-libpath.exp
We currently have two copies of target-libpath.exp in the tree under gcc/testsuite/lib and libffi/testsuite/lib. It was originally pulled into the libffi project from downstream gcc in 2009 (https://github.com/libffi/libffi/commit/5cbe2058c128e848446ae79fe15ee54260a90559). Then in 2012, Anthony Green (from libffi) modified it to correct this Windows problem (thank you! https://github.com/libffi/libffi/commit/bd78c9c3311244dd5f877c915b0dff91621dd253). In 2015, this file got pulled from upstream libffi back into gcc, thus beginning two separate development paths (https://github.com/gcc-mirror/gcc/commit/89d8a412de548b218cf7c967e65ad98bceb1ed4e). This patch merges the changes from libffi upstream which correctly solve the Windows DLL load path problem and removes the duplicate from libffi/testsuite/lib. This fixes most of bug #79867, implementing correct behaviour for set_ld_library_path_env_vars and restore_ld_library_path_env_vars. However, there is still incorrect behaviour in DejaGNU's unix_load that should eventually be adddressed, although I cannot yet point to a specific failure that it is causing. gcc/ChangeLog: 2017-04-03 Daniel Santos PR testsuite/79867 * testsuite/lib/target-libpath.exp (set_ld_library_path_env_vars, restore_ld_library_path_env_vars): Merge changes from libffi upstream, correcting DLL load path problems on Windows. libffi/ChangeLog: 2017-04-03 Daniel Santos PR testsuite/79867 * testsuite/lib/target-libpath.exp: Remove. * testsuite/Makefile.in: Remove target-libpath.exp. * testsuite/Makefile.am: Regenerated. Signed-off-by: Daniel Santos --- gcc/testsuite/lib/target-libpath.exp| 21 +++ libffi/testsuite/Makefile.am| 2 +- libffi/testsuite/Makefile.in| 2 +- libffi/testsuite/lib/target-libpath.exp | 283 4 files changed, 23 insertions(+), 285 deletions(-) delete mode 100644 libffi/testsuite/lib/target-libpath.exp diff --git a/gcc/testsuite/lib/target-libpath.exp b/gcc/testsuite/lib/target-libpath.exp index 9b3e201ed68..b6d01b31016 100644 --- a/gcc/testsuite/lib/target-libpath.exp +++ b/gcc/testsuite/lib/target-libpath.exp @@ -23,6 +23,7 @@ set orig_shlib_path_saved 0 set orig_ld_library_path_32_saved 0 set orig_ld_library_path_64_saved 0 set orig_dyld_library_path_saved 0 +set orig_path_saved 0 set orig_gcc_exec_prefix_saved 0 set orig_gcc_exec_prefix_checked 0 @@ -55,6 +56,7 @@ proc set_ld_library_path_env_vars { } { global orig_ld_library_path_32_saved global orig_ld_library_path_64_saved global orig_dyld_library_path_saved + global orig_path_saved global orig_gcc_exec_prefix_saved global orig_gcc_exec_prefix_checked global orig_ld_library_path @@ -63,6 +65,7 @@ proc set_ld_library_path_env_vars { } { global orig_ld_library_path_32 global orig_ld_library_path_64 global orig_dyld_library_path + global orig_path global orig_gcc_exec_prefix global env @@ -110,6 +113,10 @@ proc set_ld_library_path_env_vars { } { set orig_dyld_library_path "$env(DYLD_LIBRARY_PATH)" set orig_dyld_library_path_saved 1 } +if [info exists env(PATH)] { + set orig_path "$env(PATH)" + set orig_path_saved 1 +} } # We need to set ld library path in the environment. Currently, @@ -164,6 +171,13 @@ proc set_ld_library_path_env_vars { } { } else { setenv DYLD_LIBRARY_PATH "$ld_library_path" } + if { [istarget *-*-cygwin*] || [istarget *-*-mingw*] } { +if { $orig_path_saved } { + setenv PATH "$ld_library_path:$orig_path" +} else { + setenv PATH "$ld_library_path" +} + } verbose -log "LD_LIBRARY_PATH=[getenv LD_LIBRARY_PATH]" verbose -log "LD_RUN_PATH=[getenv LD_RUN_PATH]" @@ -201,12 +215,14 @@ proc restore_ld_library_path_env_vars { } { global orig_ld_library_path_32_saved global orig_ld_library_path_64_saved global orig_dyld_library_path_saved + global orig_path_saved global orig_ld_library_path global orig_ld_run_path global orig_shlib_path global orig_ld_library_path_32 global orig_ld_library_path_64 global orig_dyld_library_path + global orig_path global env restore_gcc_exec_prefix_env_var @@ -245,6 +261,11 @@ proc restore_ld_library_path_env_vars { } { } elseif [info exists env(DYLD_LIBRARY_PATH)] { unsetenv DYLD_LIBRARY_PATH } + if { $orig_path_saved } { +setenv PATH "$orig_path" + } elseif [info exists env(PATH)] { +unsetenv PATH + } } ### diff --git a/libffi/testsuite/Makefile.am b/libffi/testsuite/Makefile.am index 209e8976635..b4eb7c2bce9 100644 --- a/libffi/testsuite/Makefile.am +++ b/libffi/testsuite/Makefile.am @@ -82,7 +82,7 @@ libffi.call/cls_align_uint64.c libffi.call/cls_4byte.c \ libffi.call/cls_6
Re: [PATCH, testsuite] PR79867: Fix loading wrong DLLs on Windows, merge duplicate target-libpath.exp
I forgot to include PATCH and the PR in the subject line, sorry about that. Also, I have run a full bootstrap and testsuite to verify that I haven't missed any references to the extraneous copy of target-libpath.exp in libffi.
Re: [testsuite] Fix loading wrong DLLs on Windows, merge duplicate target-libpath.exp
On 04/05/2017 12:35 PM, Mike Stump wrote: libffi/ChangeLog: 2017-04-03 Daniel Santos PR testsuite/79867 * testsuite/lib/target-libpath.exp: Remove. * testsuite/Makefile.in: Remove target-libpath.exp. * testsuite/Makefile.am: Regenerated. I don't think the libffi project wants to remove that file. There is little point being different from them in this regard. The dup should not hurt. Hmm. There have been many changes to target-libpath.exp under gcc/testsuite/lib since libffi copied it. I have attached a diff of them. I'm not proposing removing target-libpath.exp from libffi upstream, but from the gcc tree. I'm having trouble seeing how having two different copies evolving independently can be a good thing. Daniel --- target-libpath.exp 2017-04-05 16:39:38.939768810 -0500 +++ gcc/testsuite/lib/target-libpath.exp 2017-04-05 16:39:49.350768260 -0500 @@ -1,4 +1,4 @@ -# Copyright (C) 2004, 2005, 2007 Free Software Foundation, Inc. +# Copyright (C) 2004-2017 Free Software Foundation, Inc. # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by @@ -20,12 +20,28 @@ set orig_ld_library_path_saved 0 set orig_ld_run_path_saved 0 set orig_shlib_path_saved 0 -set orig_ld_libraryn32_path_saved 0 -set orig_ld_library64_path_saved 0 set orig_ld_library_path_32_saved 0 set orig_ld_library_path_64_saved 0 set orig_dyld_library_path_saved 0 set orig_path_saved 0 +set orig_gcc_exec_prefix_saved 0 +set orig_gcc_exec_prefix_checked 0 + + +### +# proc set_gcc_exec_prefix_env_var { } +### + +proc set_gcc_exec_prefix_env_var { } { + global TEST_GCC_EXEC_PREFIX + global env + + # Set GCC_EXEC_PREFIX for the compiler under test to pick up files not in + # the build tree from a specified location (normally the install tree). + if [info exists TEST_GCC_EXEC_PREFIX] { +setenv GCC_EXEC_PREFIX "$TEST_GCC_EXEC_PREFIX" + } +} ### # proc set_ld_library_path_env_vars { } @@ -37,36 +53,39 @@ global orig_ld_library_path_saved global orig_ld_run_path_saved global orig_shlib_path_saved - global orig_ld_libraryn32_path_saved - global orig_ld_library64_path_saved global orig_ld_library_path_32_saved global orig_ld_library_path_64_saved global orig_dyld_library_path_saved global orig_path_saved + global orig_gcc_exec_prefix_saved + global orig_gcc_exec_prefix_checked global orig_ld_library_path global orig_ld_run_path global orig_shlib_path - global orig_ld_libraryn32_path - global orig_ld_library64_path global orig_ld_library_path_32 global orig_ld_library_path_64 global orig_dyld_library_path global orig_path - global GCC_EXEC_PREFIX + global orig_gcc_exec_prefix + global env - # Set the relocated compiler prefix, but only if the user hasn't specified one. - if { [info exists GCC_EXEC_PREFIX] && ![info exists env(GCC_EXEC_PREFIX)] } { -setenv GCC_EXEC_PREFIX "$GCC_EXEC_PREFIX" + # Save the original GCC_EXEC_PREFIX. + if { $orig_gcc_exec_prefix_checked == 0 } { +if [info exists env(GCC_EXEC_PREFIX)] { + set orig_gcc_exec_prefix "$env(GCC_EXEC_PREFIX)" + set orig_gcc_exec_prefix_saved 1 +} +set orig_gcc_exec_prefix_checked 1 } + set_gcc_exec_prefix_env_var + # Setting the ld library path causes trouble when testing cross-compilers. if { [is_remote target] } { return } if { $orig_environment_saved == 0 } { -global env - set orig_environment_saved 1 # Save the original environment. @@ -82,14 +101,6 @@ set orig_shlib_path "$env(SHLIB_PATH)" set orig_shlib_path_saved 1 } -if [info exists env(LD_LIBRARYN32_PATH)] { - set orig_ld_libraryn32_path "$env(LD_LIBRARYN32_PATH)" - set orig_ld_libraryn32_path_saved 1 -} -if [info exists env(LD_LIBRARY64_PATH)] { - set orig_ld_library64_path "$env(LD_LIBRARY64_PATH)" - set orig_ld_library64_path_saved 1 -} if [info exists env(LD_LIBRARY_PATH_32)] { set orig_ld_library_path_32 "$env(LD_LIBRARY_PATH_32)" set orig_ld_library_path_32_saved 1 @@ -113,12 +124,11 @@ # It only sets SHLIB_PATH and LD_LIBRARY_PATH when it executes a # program. We also need the environment set for compilations, etc. # - # On IRIX 6, we have to set variables akin to LD_LIBRARY_PATH, but - # called LD_LIBRARYN32_PATH (for the N32 ABI) and LD_LIBRARY64_PATH - # (for the 64-bit ABI). The same applies to Darwin (DYLD_LIBRARY_PATH), - # Solaris 32 bit (LD_LIBRARY_PATH_32), Solaris 64 bit (LD_LIBRARY_PATH_64), - # and HP-UX (SHLIB_PATH). In some cases, the variables are independent - # of LD_LIBRARY_PATH, and in other cases LD_LIBRARY_PATH is used if the + # On Darw
[PATCH v2,testsuite] PR79867: Merge fixes for windows DLL loading problem from libffi
We currently have two copies of target-libpath.exp in the tree under gcc/testsuite/lib and libffi/testsuite/lib. It was originally pulled into the libffi project (from downstream gcc) in 2009 (https://github.com/libffi/libffi/commit/5cbe2058c128e848446ae79fe15ee54260a90559). Then in 2012, Anthony Green (from libffi) modified it to correct this Windows problem (and thank you: https://github.com/libffi/libffi/commit/bd78c9c3311244dd5f877c915b0dff91621dd253). In 2015, this file got pulled from upstream libffi back into gcc, thus beginning two separate development paths (https://github.com/gcc-mirror/gcc/commit/89d8a412de548b218cf7c967e65ad98bceb1ed4e). This patch merges the changes from libffi upstream which correctly solve the Windows DLL load path problem in set_ld_library_path_env_vars and restore_ld_library_path_env_vars, thus fixing most PR79867. However, there is still incorrect behaviour in DejaGNU's unix_load that should eventually be adddressed, although I cannot yet point to a specific failure that it is causing. Ultimately, I think that this functionality should be moved upstream to DejaGNU where it can be managed more cleanly in board config files, but we'll have to keep this code in gcc for when DejaGNU doesn't have set/restore or push/pop libpath functionality. Signed-off-by: Daniel Santos --- gcc/testsuite/lib/target-libpath.exp | 21 + 1 file changed, 21 insertions(+) diff --git a/gcc/testsuite/lib/target-libpath.exp b/gcc/testsuite/lib/target-libpath.exp index 9b3e201ed68..b6d01b31016 100644 --- a/gcc/testsuite/lib/target-libpath.exp +++ b/gcc/testsuite/lib/target-libpath.exp @@ -23,6 +23,7 @@ set orig_shlib_path_saved 0 set orig_ld_library_path_32_saved 0 set orig_ld_library_path_64_saved 0 set orig_dyld_library_path_saved 0 +set orig_path_saved 0 set orig_gcc_exec_prefix_saved 0 set orig_gcc_exec_prefix_checked 0 @@ -55,6 +56,7 @@ proc set_ld_library_path_env_vars { } { global orig_ld_library_path_32_saved global orig_ld_library_path_64_saved global orig_dyld_library_path_saved + global orig_path_saved global orig_gcc_exec_prefix_saved global orig_gcc_exec_prefix_checked global orig_ld_library_path @@ -63,6 +65,7 @@ proc set_ld_library_path_env_vars { } { global orig_ld_library_path_32 global orig_ld_library_path_64 global orig_dyld_library_path + global orig_path global orig_gcc_exec_prefix global env @@ -110,6 +113,10 @@ proc set_ld_library_path_env_vars { } { set orig_dyld_library_path "$env(DYLD_LIBRARY_PATH)" set orig_dyld_library_path_saved 1 } +if [info exists env(PATH)] { + set orig_path "$env(PATH)" + set orig_path_saved 1 +} } # We need to set ld library path in the environment. Currently, @@ -164,6 +171,13 @@ proc set_ld_library_path_env_vars { } { } else { setenv DYLD_LIBRARY_PATH "$ld_library_path" } + if { [istarget *-*-cygwin*] || [istarget *-*-mingw*] } { +if { $orig_path_saved } { + setenv PATH "$ld_library_path:$orig_path" +} else { + setenv PATH "$ld_library_path" +} + } verbose -log "LD_LIBRARY_PATH=[getenv LD_LIBRARY_PATH]" verbose -log "LD_RUN_PATH=[getenv LD_RUN_PATH]" @@ -201,12 +215,14 @@ proc restore_ld_library_path_env_vars { } { global orig_ld_library_path_32_saved global orig_ld_library_path_64_saved global orig_dyld_library_path_saved + global orig_path_saved global orig_ld_library_path global orig_ld_run_path global orig_shlib_path global orig_ld_library_path_32 global orig_ld_library_path_64 global orig_dyld_library_path + global orig_path global env restore_gcc_exec_prefix_env_var @@ -245,6 +261,11 @@ proc restore_ld_library_path_env_vars { } { } elseif [info exists env(DYLD_LIBRARY_PATH)] { unsetenv DYLD_LIBRARY_PATH } + if { $orig_path_saved } { +setenv PATH "$orig_path" + } elseif [info exists env(PATH)] { +unsetenv PATH + } } ### -- 2.11.0
[PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues
All of patches are concerned with 64-bit Microsoft ABI functions that call System V ABI function which clobbers RSI, RDI and XMM6-15 and are aimed at improving performance and .text size of Wine 64. I had previously submitted these as separate patch sets, but have combined them for simplicity. (Does this make the ChangeLogs too big? Please let me know if you want me to break these back apart.) Below are the included patchsets and a summary of changes since the previous post(s): 1.) PR78962 Use aligned SSE movs for re-aligned MS ABI pro/epilogues. https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01859.html Changes: * The SEH unwind emit code (in winnt.c) does not currently support CFA_REG_EXPRESSION, which is required to make this work, so I have disabled it on SEH targets. * Updated comments on CFA_REG_EXPRESSION in winnt.c. 2.) Add option to call out-of-line stubs instead of emitting inline saves and restores. https://gcc.gnu.org/ml/gcc-patches/2017-02/msg00548.html Changes: * Renamed option from -moutline-msabi-xlogues to -mcall-ms2sysv-xlogues * Since this patch set depends upon aligned SSE MOVs after stack realignment, I have disabled it on SEH targets with a sorry(). * I was previously trying to cache the rtx for symbols to the libgcc stubs instead of creating new ones, but this caused problems in subsequent passes and it was disabled with a "TODO" comment. I have removed this code, as well as the rtx cache that was just wasting memory in class xlogue_layout. * Improved comment documentation. 3.) A comprehensive test program to validate correct behavior in these pro- and epilogues. https://gcc.gnu.org/ml/gcc-patches/2017-02/msg00542.html Changes: * The previous version repeated all tests for each -j instead of running in parallel. I have fixed this implementing a primitive but effective file-based parallelization scheme. * I noticed that there was gcc/testsuite/gcc.target/x86_64/abi directory for tests specific to testing 64-bit abi issues, so I've moved my tests to an "ms-sysv" subdirectory of that (instead of gcc/testsuite/gcc.target/i386/msabi). * Fixed breakages on Cygwin. * Corrected a bad "_noinfo" optimization barrier (function call by volatile pointer). * Minor cleanup/improvements. gcc/Makefile.in| 2 + gcc/config/i386/i386.c | 916 +++-- gcc/config/i386/i386.h | 33 +- gcc/config/i386/i386.opt | 4 + gcc/config/i386/predicates.md | 155 gcc/config/i386/sse.md | 37 + gcc/config/i386/winnt.c| 3 +- gcc/doc/invoke.texi| 13 +- .../gcc.target/x86_64/abi/ms-sysv/do-test.S| 163 gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc | 807 ++ .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.c| 373 + .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp | 178 libgcc/config.host | 2 +- libgcc/config/i386/i386-asm.h | 82 ++ libgcc/config/i386/resms64.S | 57 ++ libgcc/config/i386/resms64f.S | 55 ++ libgcc/config/i386/resms64fx.S | 57 ++ libgcc/config/i386/resms64x.S | 59 ++ libgcc/config/i386/savms64.S | 57 ++ libgcc/config/i386/savms64f.S | 55 ++ libgcc/config/i386/t-msabi | 7 + 21 files changed, 3020 insertions(+), 95 deletions(-) gcc/ChangeLog: 2017-04-25 Daniel Santos * config/i386/i386.opt: Add option -mcall-ms2sysv-xlogues. * config/i386/i386.h (x86_64_ms_sysv_extra_clobbered_registers): Change type to unsigned. (NUM_X86_64_MS_CLOBBERED_REGS): New macro. (struct machine_function): Add new members call_ms2sysv, call_ms2sysv_pad_in, call_ms2sysv_pad_out and call_ms2sysv_extra_regs. (struct machine_frame_state): New fields sp_realigned and sp_realigned_offset. * config/i386/i386.c (enum xlogue_stub): New enum. (enum xlogue_stub_sets): New enum. (class xlogue_layout): New class. (struct ix86_frame): New fields stack_realign_allocate_offset, stack_realign_offset and outlined_save_offset. Modify comments to detail stack layout when using out-of-line stubs. (ix86_target_string): Add -mcall-ms2sysv-xlogues option. (ix86_option_override_internal): Add sorry() for TARGET_SEH and -mcall-ms2sysv-xlogues. (stub_managed_regs): New static variable. (ix86_save_reg): Add new parameter ignore_outlined to optionally omit registers managed by out-of-line stub. (disable_call_ms2sysv_xlogues): New function. (ix
[PATCH 02/12] [i386] Keep stack pointer valid after after re-alignment.
Add the fields sp_realigned and sp_realigned_offset to struct machine_frame_state. We now have the concept of the stack pointer being re-aligned rather than invalid. The inline functions sp_valid_at and fp_valid_at are added to test if a given location relative to the CFA can be accessed with the stack or frame pointer, respectively. Stack allocation prior to re-alignment is modified so that we allocate what is needed, but don't allocate unneeded space in the event that no SSE registers are saved, but frame.sse_reg_save_offset is increased for alignment. As this change only alters how SSE registers are saved, moving the re-alignment AND should not hinder parallelization of int register saves. Signed-off-by: Daniel Santos --- gcc/config/i386/i386.c | 74 +- gcc/config/i386/i386.h | 11 2 files changed, 66 insertions(+), 19 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 31f69c92968..7923486157d 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12783,6 +12783,24 @@ choose_baseaddr_len (unsigned int regno, HOST_WIDE_INT offset) return len; } +/* Determine if the stack pointer is valid for accessing the cfa_offset. */ + +static inline bool sp_valid_at (HOST_WIDE_INT cfa_offset) +{ + const struct machine_frame_state &fs = cfun->machine->fs; + return fs.sp_valid && !(fs.sp_realigned + && cfa_offset < fs.sp_realigned_offset); +} + +/* Determine if the frame pointer is valid for accessing the cfa_offset. */ + +static inline bool fp_valid_at (HOST_WIDE_INT cfa_offset) +{ + const struct machine_frame_state &fs = cfun->machine->fs; + return fs.fp_valid && !(fs.sp_valid && fs.sp_realigned + && cfa_offset >= fs.sp_realigned_offset); +} + /* Return an RTX that points to CFA_OFFSET within the stack frame. The valid base registers are taken from CFUN->MACHINE->FS. */ @@ -13081,15 +13099,18 @@ pro_epilogue_adjust_stack (rtx dest, rtx src, rtx offset, { HOST_WIDE_INT ooffset = m->fs.sp_offset; bool valid = m->fs.sp_valid; + bool realigned = m->fs.sp_realigned; if (src == hard_frame_pointer_rtx) { valid = m->fs.fp_valid; + realigned = false; ooffset = m->fs.fp_offset; } else if (src == crtl->drap_reg) { valid = m->fs.drap_valid; + realigned = false; ooffset = 0; } else @@ -13103,6 +13124,7 @@ pro_epilogue_adjust_stack (rtx dest, rtx src, rtx offset, m->fs.sp_offset = ooffset - INTVAL (offset); m->fs.sp_valid = valid; + m->fs.sp_realigned = realigned; } } @@ -13852,6 +13874,7 @@ ix86_expand_prologue (void) this is fudged; we're interested to offsets within the local frame. */ m->fs.sp_offset = INCOMING_FRAME_SP_OFFSET; m->fs.sp_valid = true; + m->fs.sp_realigned = false; ix86_compute_frame_layout (&frame); @@ -14068,11 +14091,10 @@ ix86_expand_prologue (void) that we must allocate the size of the register save area before performing the actual alignment. Otherwise we cannot guarantee that there's enough storage above the realignment point. */ - if (m->fs.sp_offset != frame.sse_reg_save_offset) + allocate = frame.stack_realign_allocate_offset - m->fs.sp_offset; + if (allocate) pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, - GEN_INT (m->fs.sp_offset - - frame.sse_reg_save_offset), - -1, false); + GEN_INT (-allocate), -1, false); /* Align the stack. */ insn = emit_insn (ix86_gen_andsp (stack_pointer_rtx, @@ -14080,11 +14102,19 @@ ix86_expand_prologue (void) GEN_INT (-align_bytes))); /* For the purposes of register save area addressing, the stack - pointer is no longer valid. As for the value of sp_offset, -see ix86_compute_frame_layout, which we need to match in order -to pass verification of stack_pointer_offset at the end. */ +pointer can no longer be used to access anything in the frame +below m->fs.sp_realigned_offset and the frame pointer cannot be +used for anything at or above. */ m->fs.sp_offset = ROUND_UP (m->fs.sp_offset, align_bytes); - m->fs.sp_valid = false; + m->fs.sp_realigned = true; + m->fs.sp_realigned_offset = m->fs.sp_offset - frame.nsseregs * 16; + gcc_assert (m->fs.sp_realigned_offset == frame.stack_realign_offset); + /* SEH unwind emit doesn't currently support REG_CFA_EXPRESSION, which +is needed to des
[PATCH 01/12] [i386] Re-align stack frame prior to SSE saves.
Add new fields to struct ix86_frame to track where we started the stack re-alignment and what we need to allocate prior to re-alignment. In ix86_compute_frame_layout, we do the stack frame re-alignment computation prior to computing the SSE save area so that it we have an aligned SSE save area. This new also assures that the SSE save area is properly aligned when DRAP is used. Signed-off-by: Daniel Santos --- gcc/config/i386/i386.c | 40 +--- 1 file changed, 25 insertions(+), 15 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index d9856573db7..31f69c92968 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2455,7 +2455,7 @@ struct GTY(()) stack_local_entry { [saved regs] <- regs_save_offset [padding0] - + <- stack_realign_offset [saved SSE regs] <- sse_regs_save_offset [padding1] | @@ -2481,6 +2481,8 @@ struct ix86_frame HOST_WIDE_INT stack_pointer_offset; HOST_WIDE_INT hfp_save_offset; HOST_WIDE_INT reg_save_offset; + HOST_WIDE_INT stack_realign_allocate_offset; + HOST_WIDE_INT stack_realign_offset; HOST_WIDE_INT sse_reg_save_offset; /* When save_regs_using_mov is set, emit prologue using @@ -12636,28 +12638,36 @@ ix86_compute_frame_layout (struct ix86_frame *frame) if (TARGET_SEH) frame->hard_frame_pointer_offset = offset; + /* When re-aligning the stack frame, but not saving SSE registers, this + is the offset we want adjust the stack pointer to. */ + frame->stack_realign_allocate_offset = offset; + + /* The re-aligned stack starts here. Values before this point are not + directly comparable with values below this point. Use sp_valid_at + to determine if the stack pointer is valid for a given offset and + fp_valid_at for the frame pointer. */ + if (stack_realign_fp) +offset = ROUND_UP (offset, stack_alignment_needed); + frame->stack_realign_offset = offset; + /* Align and set SSE register save area. */ if (frame->nsseregs) { /* The only ABI that has saved SSE registers (Win64) also has a -16-byte aligned default stack, and thus we don't need to be -within the re-aligned local stack frame to save them. In case -incoming stack boundary is aligned to less than 16 bytes, -unaligned move of SSE register will be emitted, so there is -no point to round up the SSE register save area outside the -re-aligned local stack frame to 16 bytes. */ - if (ix86_incoming_stack_boundary >= 128) +16-byte aligned default stack. However, many programs violate +the ABI, and Wine64 forces stack realignment to compensate. + +If the incoming stack boundary is at least 16 bytes, or DRAP is +required and the DRAP re-alignment boundary is at least 16 bytes, +then we want the SSE register save area properly aligned. */ + if (ix86_incoming_stack_boundary >= 128 + || (stack_realign_drap && stack_alignment_needed >= 16)) offset = ROUND_UP (offset, 16); offset += frame->nsseregs * 16; + frame->stack_realign_allocate_offset = offset; } - frame->sse_reg_save_offset = offset; - /* The re-aligned stack starts here. Values before this point are not - directly comparable with values below this point. In order to make - sure that no value happens to be the same before and after, force - the alignment computation below to add a non-zero value. */ - if (stack_realign_fp) -offset = ROUND_UP (offset, stack_alignment_needed); + frame->sse_reg_save_offset = offset; /* Va-arg area */ frame->va_arg_size = ix86_varargs_gpr_size + ix86_varargs_fpr_size; -- 2.11.0
[PATCH 04/12] [i386] Minor refactoring
For the sake of clarity, I've separated out these minor refactoring changes from the remainder of this patch set. Signed-off-by: Daniel Santos --- gcc/config/i386/i386.c | 21 ++--- gcc/config/i386/i386.h | 4 +++- 2 files changed, 13 insertions(+), 12 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index e8a4ba6fe8d..113f83742c2 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2424,7 +2424,7 @@ static int const x86_64_int_return_registers[4] = /* Additional registers that are clobbered by SYSV calls. */ -int const x86_64_ms_sysv_extra_clobbered_registers[12] = +unsigned const x86_64_ms_sysv_extra_clobbered_registers[12] = { SI_REG, DI_REG, XMM6_REG, XMM7_REG, @@ -12539,6 +12539,7 @@ ix86_builtin_setjmp_frame_value (void) static void ix86_compute_frame_layout (struct ix86_frame *frame) { + struct machine_function *m = cfun->machine; unsigned HOST_WIDE_INT stack_alignment_needed; HOST_WIDE_INT offset; unsigned HOST_WIDE_INT preferred_alignment; @@ -12573,19 +12574,19 @@ ix86_compute_frame_layout (struct ix86_frame *frame) scheduling that can be done, which means that there's very little point in doing anything except PUSHs. */ if (TARGET_SEH) -cfun->machine->use_fast_prologue_epilogue = false; +m->use_fast_prologue_epilogue = false; /* During reload iteration the amount of registers saved can change. Recompute the value as needed. Do not recompute when amount of registers didn't change as reload does multiple calls to the function and does not expect the decision to change within single iteration. */ else if (!optimize_bb_for_size_p (ENTRY_BLOCK_PTR_FOR_FN (cfun)) - && cfun->machine->use_fast_prologue_epilogue_nregs != frame->nregs) + && m->use_fast_prologue_epilogue_nregs != frame->nregs) { int count = frame->nregs; struct cgraph_node *node = cgraph_node::get (current_function_decl); - cfun->machine->use_fast_prologue_epilogue_nregs = count; + m->use_fast_prologue_epilogue_nregs = count; /* The fast prologue uses move instead of push to save registers. This is significantly longer, but also executes faster as modern hardware @@ -12602,14 +12603,14 @@ ix86_compute_frame_layout (struct ix86_frame *frame) if (node->frequency < NODE_FREQUENCY_NORMAL || (flag_branch_probabilities && node->frequency < NODE_FREQUENCY_HOT)) -cfun->machine->use_fast_prologue_epilogue = false; + m->use_fast_prologue_epilogue = false; else -cfun->machine->use_fast_prologue_epilogue + m->use_fast_prologue_epilogue = !expensive_function_p (count); } frame->save_regs_using_mov -= (TARGET_PROLOGUE_USING_MOVE && cfun->machine->use_fast_prologue_epilogue += (TARGET_PROLOGUE_USING_MOVE && m->use_fast_prologue_epilogue /* If static stack checking is enabled and done with probes, the registers need to be saved before allocating the frame. */ && flag_stack_check != STATIC_BUILTIN_STACK_CHECK); @@ -28683,11 +28684,9 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1, else if (TARGET_64BIT_MS_ABI && (!callarg2 || INTVAL (callarg2) != -2)) { - int const cregs_size - = ARRAY_SIZE (x86_64_ms_sysv_extra_clobbered_registers); - int i; + unsigned i; - for (i = 0; i < cregs_size; i++) + for (i = 0; i < NUM_X86_64_MS_CLOBBERED_REGS; i++) { int regno = x86_64_ms_sysv_extra_clobbered_registers[i]; machine_mode mode = SSE_REGNO_P (regno) ? TImode : DImode; diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 4e4cb7ca7e3..645b239a768 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -2163,7 +2163,9 @@ extern int const dbx_register_map[FIRST_PSEUDO_REGISTER]; extern int const dbx64_register_map[FIRST_PSEUDO_REGISTER]; extern int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER]; -extern int const x86_64_ms_sysv_extra_clobbered_registers[12]; +extern unsigned const x86_64_ms_sysv_extra_clobbered_registers[12]; +#define NUM_X86_64_MS_CLOBBERED_REGS \ + (ARRAY_SIZE (x86_64_ms_sysv_extra_clobbered_registers)) /* Before the prologue, RA is at 0(%esp). */ #define INCOMING_RETURN_ADDR_RTX \ -- 2.11.0
[PATCH 03/12] [i386] Use re-aligned stack pointer for aligned SSE movs
Add an optional `align' parameter to choose_baseaddr, allowing the caller to request an address that is aligned to some boundary. Modify ix86_emit_save_regs_using_mov and ix86_emit_restore_regs_using_mov use optimally aligned memory when such a base register is available. Signed-off-by: Daniel Santos --- gcc/config/i386/i386.c | 111 ++-- gcc/config/i386/winnt.c | 3 +- 2 files changed, 90 insertions(+), 24 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 7923486157d..e8a4ba6fe8d 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12801,15 +12801,39 @@ static inline bool fp_valid_at (HOST_WIDE_INT cfa_offset) && cfa_offset >= fs.sp_realigned_offset); } -/* Return an RTX that points to CFA_OFFSET within the stack frame. - The valid base registers are taken from CFUN->MACHINE->FS. */ +/* Choose a base register based upon alignment requested, speed and/or + size. */ -static rtx -choose_baseaddr (HOST_WIDE_INT cfa_offset) +static void choose_basereg (HOST_WIDE_INT cfa_offset, rtx &base_reg, + HOST_WIDE_INT &base_offset, + unsigned int align_reqested, unsigned int *align) { const struct machine_function *m = cfun->machine; - rtx base_reg = NULL; - HOST_WIDE_INT base_offset = 0; + unsigned int hfp_align; + unsigned int drap_align; + unsigned int sp_align; + bool hfp_ok = fp_valid_at (cfa_offset); + bool drap_ok = m->fs.drap_valid; + bool sp_ok = sp_valid_at (cfa_offset); + + hfp_align = drap_align = sp_align = INCOMING_STACK_BOUNDARY; + + /* Filter out any registers that don't meet the requested alignment + criteria. */ + if (align_reqested) +{ + if (m->fs.realigned) + hfp_align = drap_align = sp_align = crtl->stack_alignment_needed; + /* SEH unwind code does do not currently support REG_CFA_EXPRESSION +notes (which we would need to use a realigned stack pointer), +so disable on SEH targets. */ + else if (m->fs.sp_realigned) + sp_align = crtl->stack_alignment_needed; + + hfp_ok = hfp_ok && hfp_align >= align_reqested; + drap_ok = drap_ok && drap_align >= align_reqested; + sp_ok = sp_ok && sp_align >= align_reqested; +} if (m->use_fast_prologue_epilogue) { @@ -12818,17 +12842,17 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset) while DRAP must be reloaded within the epilogue. But choose either over the SP due to increased encoding size. */ - if (m->fs.fp_valid) + if (hfp_ok) { base_reg = hard_frame_pointer_rtx; base_offset = m->fs.fp_offset - cfa_offset; } - else if (m->fs.drap_valid) + else if (drap_ok) { base_reg = crtl->drap_reg; base_offset = 0 - cfa_offset; } - else if (m->fs.sp_valid) + else if (sp_ok) { base_reg = stack_pointer_rtx; base_offset = m->fs.sp_offset - cfa_offset; @@ -12841,13 +12865,13 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset) /* Choose the base register with the smallest address encoding. With a tie, choose FP > DRAP > SP. */ - if (m->fs.sp_valid) + if (sp_ok) { base_reg = stack_pointer_rtx; base_offset = m->fs.sp_offset - cfa_offset; len = choose_baseaddr_len (STACK_POINTER_REGNUM, base_offset); } - if (m->fs.drap_valid) + if (drap_ok) { toffset = 0 - cfa_offset; tlen = choose_baseaddr_len (REGNO (crtl->drap_reg), toffset); @@ -12858,7 +12882,7 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset) len = tlen; } } - if (m->fs.fp_valid) + if (hfp_ok) { toffset = m->fs.fp_offset - cfa_offset; tlen = choose_baseaddr_len (HARD_FRAME_POINTER_REGNUM, toffset); @@ -12870,8 +12894,40 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset) } } } - gcc_assert (base_reg != NULL); +/* Set the align return value. */ +if (align) + { + if (base_reg == stack_pointer_rtx) + *align = sp_align; + else if (base_reg == crtl->drap_reg) + *align = drap_align; + else if (base_reg == hard_frame_pointer_rtx) + *align = hfp_align; + } +} + +/* Return an RTX that points to CFA_OFFSET within the stack frame and + the alignment of address. If align is non-null, it should point to + an alignment value (in bits) that is preferred or zero and will + recieve the alignment of the base register that was selected. The + valid base registers are taken from CFUN->MACHINE->FS. */ + +static rtx +choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align) +{ + rtx base_reg = NULL; + HOST_W
[PATCH 05/12] [i386] Add option -mcall-ms2sysv-xlogues
Adds the options -mcall-ms2sysv-xlogues to i386.opt and i386.c and documentation to invoke.texi. Using -mcall-ms2sysv-xlogues on SEH targets is currently unsupported and will result in a sorry (). SEH targets can be supported, but it would require adding support for CFA_REG_EXPRESSION to the SEH unwind emit code in gcc/config/i386/winnt.c -- this is the same for use of aligned SSE MOVs after a realigned stack pointer. Signed-off-by: Daniel Santos --- gcc/config/i386/i386.c | 6 +- gcc/config/i386/i386.opt | 4 gcc/doc/invoke.texi | 13 - 3 files changed, 21 insertions(+), 2 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 113f83742c2..521116195cb 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -4508,7 +4508,8 @@ ix86_target_string (HOST_WIDE_INT isa, HOST_WIDE_INT isa2, { "-mstv", MASK_STV }, { "-mavx256-split-unaligned-load", MASK_AVX256_SPLIT_UNALIGNED_LOAD }, { "-mavx256-split-unaligned-store", MASK_AVX256_SPLIT_UNALIGNED_STORE }, -{ "-mprefer-avx128", MASK_PREFER_AVX128 } +{ "-mprefer-avx128", MASK_PREFER_AVX128 }, +{ "-mcall-ms2sysv-xlogues",MASK_CALL_MS2SYSV_XLOGUES } }; /* Additional flag options. */ @@ -6319,6 +6320,9 @@ ix86_option_override_internal (bool main_args_p, #endif } + if (TARGET_SEH && TARGET_CALL_MS2SYSV_XLOGUES) +sorry ("-mcall-ms2sysv-xlogues isn%'t currently supported with SEH"); + if (!(opts_set->x_target_flags & MASK_VZEROUPPER)) opts->x_target_flags |= MASK_VZEROUPPER; if (!(opts_set->x_target_flags & MASK_STV)) diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 9384e29b1de..65b228544a5 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -538,6 +538,10 @@ Enum(calling_abi) String(sysv) Value(SYSV_ABI) EnumValue Enum(calling_abi) String(ms) Value(MS_ABI) +mcall-ms2sysv-xlogues +Target Report Mask(CALL_MS2SYSV_XLOGUES) Save +Use libgcc stubs to save and restore registers clobbered by 64-bit Microsoft to System V ABI calls. + mveclibabi= Target RejectNegative Joined Var(ix86_veclibabi_type) Enum(ix86_veclibabi) Init(ix86_veclibabi_type_none) Vector library ABI to use. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 0eeea7b3b87..c9e565a9216 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1209,7 +1209,7 @@ See RS/6000 and PowerPC Options. -msse2avx -mfentry -mrecord-mcount -mnop-mcount -m8bit-idiv @gol -mavx256-split-unaligned-load -mavx256-split-unaligned-store @gol -malign-data=@var{type} -mstack-protector-guard=@var{guard} @gol --mmitigate-rop -mgeneral-regs-only} +-mmitigate-rop -mgeneral-regs-only -mcall-ms2sysv-xlogues} @emph{x86 Windows Options} @gccoptlist{-mconsole -mcygwin -mno-cygwin -mdll @gol @@ -25308,6 +25308,17 @@ You can control this behavior for specific functions by using the function attributes @code{ms_abi} and @code{sysv_abi}. @xref{Function Attributes}. +@item -mcall-ms2sysv-xlogues +@opindex mcall-ms2sysv-xlogues +@opindex mno-call-ms2sysv-xlogues +Due to differences in 64-bit ABIs, any Microsoft ABI function that calls a +System V ABI function must consider RSI, RDI and XMM6-15 as clobbered. By +default, the code for saving and restoring these registers is emitted inline, +resulting in fairly lengthy prologues and epilogues. Using +@option{-mcall-ms2sysv-xlogues} emits prologues and epilogues that +use stubs in the static portion of libgcc to perform these saves & restores, +thus reducing function size at the cost of a few extra instructions. + @item -mtls-dialect=@var{type} @opindex mtls-dialect Generate code to access thread-local storage using the @samp{gnu} or -- 2.11.0
[PATCH 09/12] [i386] Add patterns and predicates foutline-msabi-xlouges
Adds the predicates save_multiple and restore_multiple to predicates.md, which are used by following patterns in sse.md: * save_multiple - insn that calls a save stub * restore_multiple - call_insn that calls a save stub and returns to the function to allow a sibling call (which should typically offer better optimization than the restore stub as the tail call) * restore_multiple_and_return - a jump_insn that returns from the function as a tail-call. * restore_multiple_leave_return - like the above, but restores the frame pointer before returning. Signed-off-by: Daniel Santos --- gcc/config/i386/predicates.md | 155 ++ gcc/config/i386/sse.md| 37 ++ 2 files changed, 192 insertions(+) diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index 8f250a2e720..36fe8abc3f4 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -1657,3 +1657,158 @@ (ior (match_operand 0 "register_operand") (and (match_code "const_int") (match_test "op == constm1_rtx" + +;; Return true if: +;; 1. first op is a symbol reference, +;; 2. >= 13 operands, and +;; 3. operands 2 to end is one of: +;; a. save a register to a memory location, or +;; b. restore stack pointer. +(define_predicate "save_multiple" + (match_code "parallel") +{ + const unsigned nregs = XVECLEN (op, 0); + rtx head = XVECEXP (op, 0, 0); + unsigned i; + + if (GET_CODE (head) != USE) +return false; + else +{ + rtx op0 = XEXP (head, 0); + if (op0 == NULL_RTX || GET_CODE (op0) != SYMBOL_REF) + return false; +} + + if (nregs < 13) +return false; + + for (i = 2; i < nregs; i++) +{ + rtx e, src, dest; + + e = XVECEXP (op, 0, i); + + switch (GET_CODE (e)) + { + case SET: + src = SET_SRC (e); + dest = SET_DEST (e); + + /* storing a register to memory. */ + if (GET_CODE (src) == REG && GET_CODE (dest) == MEM) + { + rtx addr = XEXP (dest, 0); + + /* Good if dest address is in RAX. */ + if (GET_CODE (addr) == REG + && REGNO (addr) == AX_REG) + continue; + + /* Good if dest address is offset of RAX. */ + if (GET_CODE (addr) == PLUS + && GET_CODE (XEXP (addr, 0)) == REG + && REGNO (XEXP (addr, 0)) == AX_REG) + continue; + } + break; + + default: + break; + } + return false; +} + return true; +}) + +;; Return true if: +;; * first op is (return) or a a use (symbol reference), +;; * >= 14 operands, and +;; * operands 2 to end are one of: +;; - restoring a register from a memory location that's an offset of RSI. +;; - clobbering a reg +;; - adjusting SP +(define_predicate "restore_multiple" + (match_code "parallel") +{ + const unsigned nregs = XVECLEN (op, 0); + rtx head = XVECEXP (op, 0, 0); + unsigned i; + + switch (GET_CODE (head)) +{ + case RETURN: + i = 3; + break; + + case USE: + { + rtx op0 = XEXP (head, 0); + + if (op0 == NULL_RTX || GET_CODE (op0) != SYMBOL_REF) + return false; + + i = 1; + break; + } + + default: + return false; +} + + if (nregs < i + 12) +return false; + + for (; i < nregs; i++) +{ + rtx e, src, dest; + + e = XVECEXP (op, 0, i); + + switch (GET_CODE (e)) + { + case CLOBBER: + continue; + + case SET: + src = SET_SRC (e); + dest = SET_DEST (e); + + /* Restoring a register from memory. */ + if (GET_CODE (src) == MEM && GET_CODE (dest) == REG) + { + rtx addr = XEXP (src, 0); + + /* Good if src address is in RSI. */ + if (GET_CODE (addr) == REG + && REGNO (addr) == SI_REG) + continue; + + /* Good if src address is offset of RSI. */ + if (GET_CODE (addr) == PLUS + && GET_CODE (XEXP (addr, 0)) == REG + && REGNO (XEXP (addr, 0)) == SI_REG) + continue; + + /* Good if adjusting stack pointer. */ + if (GET_CODE (dest) == REG + && REGNO (dest) == SP_REG + && GET_CODE (src) == PLUS + && GET_CODE (XEXP (src, 0)) == REG + && REGNO (XEXP (src, 0)) == SP_REG) + continue; + } + + /* Restoring stack pointer from another register. */ + if (GET_CODE (dest) == REG && REGNO (dest) == SP_REG +
[PATCH 10/12] [i386] Add ms2sysv pro/epilogue stubs to libgcc
Add new header libgcc/config/i386/i386-asm.h to manage common cpp and gas macros. Add new stubs. Stubs use the following naming convention: __ms64[f][x]_ Save or restore ms64Avoid possible name collisions with future stubs (specific to 64-bit msabi --> sysv scenario) [f] Variant for hard frame pointer (and stack realignment) [x] Tail-call variant (is the return from function) The number of registers to save. Signed-off-by: Daniel Santos --- libgcc/config.host | 2 +- libgcc/config/i386/i386-asm.h | 82 ++ libgcc/config/i386/resms64.S | 57 + libgcc/config/i386/resms64f.S | 55 libgcc/config/i386/resms64fx.S | 57 + libgcc/config/i386/resms64x.S | 59 ++ libgcc/config/i386/savms64.S | 57 + libgcc/config/i386/savms64f.S | 55 libgcc/config/i386/t-msabi | 7 9 files changed, 430 insertions(+), 1 deletion(-) create mode 100644 libgcc/config/i386/i386-asm.h create mode 100644 libgcc/config/i386/resms64.S create mode 100644 libgcc/config/i386/resms64f.S create mode 100644 libgcc/config/i386/resms64fx.S create mode 100644 libgcc/config/i386/resms64x.S create mode 100644 libgcc/config/i386/savms64.S create mode 100644 libgcc/config/i386/savms64f.S create mode 100644 libgcc/config/i386/t-msabi diff --git a/libgcc/config.host b/libgcc/config.host index b279a6458f9..b6d10951f3f 100644 --- a/libgcc/config.host +++ b/libgcc/config.host @@ -1351,7 +1351,7 @@ case ${host} in i[34567]86-*-linux* | x86_64-*-linux* | \ i[34567]86-*-kfreebsd*-gnu | x86_64-*-kfreebsd*-gnu | \ i[34567]86-*-gnu*) - tmake_file="${tmake_file} t-tls i386/t-linux t-slibgcc-libgcc" + tmake_file="${tmake_file} t-tls i386/t-linux i386/t-msabi t-slibgcc-libgcc" if test "$libgcc_cv_cfi" = "yes"; then tmake_file="${tmake_file} t-stack i386/t-stack-i386" fi diff --git a/libgcc/config/i386/i386-asm.h b/libgcc/config/i386/i386-asm.h new file mode 100644 index 000..c613e9fd83d --- /dev/null +++ b/libgcc/config/i386/i386-asm.h @@ -0,0 +1,82 @@ +/* Defines common perprocessor and assembly macros for use by various stubs. + Copyright (C) 2016-2017 Free Software Foundation, Inc. + Contributed by Daniel Santos + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +<http://www.gnu.org/licenses/>. */ + +#ifndef I386_ASM_H +#define I386_ASM_H + +#ifdef __ELF__ +# define ELFFN(fn) .type fn,@function +#else +# define ELFFN(fn) +#endif + +#define FUNC_START(fn) \ + .global fn; \ + ELFFN (fn); \ +fn: + +#define HIDDEN_FUNC(fn)\ + FUNC_START (fn) \ + .hidden fn; \ + +#define FUNC_END(fn) .size fn,.-fn + +#ifdef __SSE2__ +# ifdef __AVX__ +# define MOVAPS vmovaps +# else +# define MOVAPS movaps +# endif + +/* Save SSE registers 6-15. off is the offset of rax to get to xmm6. */ +.macro SSE_SAVE off=0 + MOVAPS %xmm15,(\off - 0x90)(%rax) + MOVAPS %xmm14,(\off - 0x80)(%rax) + MOVAPS %xmm13,(\off - 0x70)(%rax) + MOVAPS %xmm12,(\off - 0x60)(%rax) + MOVAPS %xmm11,(\off - 0x50)(%rax) + MOVAPS %xmm10,(\off - 0x40)(%rax) + MOVAPS %xmm9, (\off - 0x30)(%rax) + MOVAPS %xmm8, (\off - 0x20)(%rax) + MOVAPS %xmm7, (\off - 0x10)(%rax) + MOVAPS %xmm6, \off(%rax) +.endm + +/* Restore SSE registers 6-15. off is the offset of rsi to get to xmm6. */ +.macro SSE_RESTORE off=0 + MOVAPS (\off - 0x90)(%rsi), %xmm15 + MOVAPS (\off - 0x80)(%rsi), %xmm14 + MOVAPS (\off - 0x70)(%rsi), %xmm13 + MOVAPS (\off - 0x60)(%rsi), %xmm12 + MOVAPS (\off - 0x50)(%rsi), %xmm11 + MOVAPS (\off - 0x40)(%rsi), %xmm10 + MOVAPS (\off - 0x30)(%rsi), %xmm9 + MOVAPS (\off - 0x20)(%rsi), %xmm8 + MOVAPS (\off - 0x10)(%rsi), %xmm7 + MOVAPS \off(%rsi), %xmm6 +.endm + +#endif /* __SSE2__ */ +#endif /* I386_
[PATCH 08/12] [i386] Modify ix86_compute_frame_layout for -mcall-ms2sysv-xlogues
ix86_compute_frame_layout will now populate fields added to structs machine_function and ix86_frame and modify the frame layout specifics to facilitate the use of save & restore stubs. This is also where we init stub_managed_regs to track which register saves & restores are being managed by the out-of-line stub and which are being managed inline, as it is possible to have registers being managed both inline and out-of-line when inline asm explicitly clobbers a register. Signed-off-by: Daniel Santos --- gcc/config/i386/i386.c | 94 +++--- 1 file changed, 90 insertions(+), 4 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 4f0cb7dd6cc..debfe457d97 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2715,12 +2715,29 @@ struct GTY(()) stack_local_entry { saved frame pointer if frame_pointer_needed <- HARD_FRAME_POINTER [saved regs] - <- regs_save_offset + <- reg_save_offset [padding0] <- stack_realign_offset [saved SSE regs] + OR + [stub-saved registers for ms x64 --> sysv clobbers + <- Start of out-of-line, stub-saved/restored regs + (see libgcc/config/i386/(sav|res)ms64*.S) + [XMM6-15] + [RSI] + [RDI] + [?RBX]only if RBX is clobbered + [?RBP]only if RBP and RBX are clobbered + [?R12]only if R12 and all previous regs are clobbered + [?R13]only if R13 and all previous regs are clobbered + [?R14]only if R14 and all previous regs are clobbered + [?R15]only if R15 and all previous regs are clobbered + <- end of stub-saved/restored regs + [padding1] + ] + <- outlined_save_offset <- sse_regs_save_offset - [padding1] | + [padding2] |<- FRAME_POINTER [va_arg registers] | | @@ -2745,6 +2762,7 @@ struct ix86_frame HOST_WIDE_INT reg_save_offset; HOST_WIDE_INT stack_realign_allocate_offset; HOST_WIDE_INT stack_realign_offset; + HOST_WIDE_INT outlined_save_offset; HOST_WIDE_INT sse_reg_save_offset; /* When save_regs_using_mov is set, emit prologue using @@ -12802,6 +12820,15 @@ ix86_builtin_setjmp_frame_value (void) return stack_realign_fp ? hard_frame_pointer_rtx : virtual_stack_vars_rtx; } +/* Disables out-of-lined msabi to sysv pro/epilogues and emits a warning if + warn_once is null, or *warn_once is zero. */ +static void disable_call_ms2sysv_xlogues (const char *feature) +{ + cfun->machine->call_ms2sysv = false; + warning (OPT_mcall_ms2sysv_xlogues, "not currently compatible with %s.", + feature); +} + /* When using -fsplit-stack, the allocation routines set a field in the TCB to the bottom of the stack plus this much space, measured in bytes. */ @@ -12820,9 +12847,50 @@ ix86_compute_frame_layout (struct ix86_frame *frame) HOST_WIDE_INT size = get_frame_size (); HOST_WIDE_INT to_allocate; + CLEAR_HARD_REG_SET (stub_managed_regs); + + /* m->call_ms2sysv is initially enabled in ix86_expand_call for all 64-bit + * ms_abi functions that call a sysv function. We now need to prune away + * cases where it should be disabled. */ + if (TARGET_64BIT && m->call_ms2sysv) + { +gcc_assert (TARGET_64BIT_MS_ABI); +gcc_assert (TARGET_CALL_MS2SYSV_XLOGUES); +gcc_assert (!TARGET_SEH); + +if (!TARGET_SSE) + m->call_ms2sysv = false; + +/* Don't break hot-patched functions. */ +else if (ix86_function_ms_hook_prologue (current_function_decl)) + m->call_ms2sysv = false; + +/* TODO: Cases not yet examined. */ +else if (crtl->calls_eh_return) + disable_call_ms2sysv_xlogues ("__builtin_eh_return"); + +else if (ix86_static_chain_on_stack) + disable_call_ms2sysv_xlogues ("static call chains"); + +else if (ix86_using_red_zone ()) + disable_call_ms2sysv_xlogues ("red zones"); + +else if (flag_split_stack) + disable_call_ms2sysv_xlogues ("split stack"); + +/* Finally, compute which registers the stub will manage. */ +else + { + unsigned count = xlogue_layout +::compute_stub_managed_regs (stub_managed_regs); + m->call_ms2sysv_extra_regs = count - xlogue_layout::MIN_REGS; + } + } + frame->nregs = ix86_nsaved_regs (); frame->nsseregs = ix86_nsaved_sseregs (); - CLEAR_HARD_REG_SET (stub_managed_regs); + m->call_ms2sysv_pad_in = 0; + m->call_ms2sysv_pad_out = 0; /* 64-bit MS ABI seem
[PATCH 06/12] [i386] Add class xlogue_layout and new fields to struct machine_function
Of the new fields added to struct machine_function, call_ms2sysv is initially set in ix86_expand_call, but may later be cleared when ix86_compute_frame_layout is called (both of these are in subsequent patch). If it is not cleared, then the remaining new fields will be set in ix86_compute_frame_layout (also a subsequent patch). The new class xlogue_layout manages the layout of the stack area used by the out-of-line save & restore stubs as well as any padding needed before and after the save area. It also provides the proper symbol rtx for the requested stub based upon values of the new fields in struct machine_function, which specify how many registers are being saved, what padding is needed, etc. xlouge_layout cannot be used until stack realign flags are finalized and ix86_compute_frame_layout is called, at which point xlouge_layout::get_instance may be used to retrieve the appropriate (constant) instance of xlouge_layout. Signed-off-by: Daniel Santos --- gcc/config/i386/i386.c | 262 + gcc/config/i386/i386.h | 18 2 files changed, 280 insertions(+) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 521116195cb..2da3da1f97a 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -93,6 +93,7 @@ static rtx legitimize_dllimport_symbol (rtx, bool); static rtx legitimize_pe_coff_extern_decl (rtx, bool); static rtx legitimize_pe_coff_symbol (rtx, bool); static void ix86_print_operand_address_as (FILE *, rtx, addr_space_t, bool); +static bool ix86_save_reg (unsigned int, bool, bool); #ifndef CHECK_STACK_LIMIT #define CHECK_STACK_LIMIT (-1) @@ -2432,6 +2433,267 @@ unsigned const x86_64_ms_sysv_extra_clobbered_registers[12] = XMM12_REG, XMM13_REG, XMM14_REG, XMM15_REG }; +enum xlogue_stub { + XLOGUE_STUB_SAVE, + XLOGUE_STUB_RESTORE, + XLOGUE_STUB_RESTORE_TAIL, + XLOGUE_STUB_SAVE_HFP, + XLOGUE_STUB_RESTORE_HFP, + XLOGUE_STUB_RESTORE_HFP_TAIL, + + XLOGUE_STUB_COUNT +}; + +enum xlogue_stub_sets { + XLOGUE_SET_ALIGNED, + XLOGUE_SET_ALIGNED_PLUS_8, + XLOGUE_SET_HFP_ALIGNED_OR_REALIGN, + XLOGUE_SET_HFP_ALIGNED_PLUS_8, + + XLOGUE_SET_COUNT +}; + +/* Register save/restore layout used by out-of-line stubs. */ +class xlogue_layout { +public: + struct reginfo + { +unsigned regno; +HOST_WIDE_INT offset; /* Offset used by stub base pointer (rax or + rsi) to where each register is stored. */ + }; + + unsigned get_nregs () const {return m_nregs;} + HOST_WIDE_INT get_stack_align_off_in () const{return m_stack_align_off_in;} + + const reginfo &get_reginfo (unsigned reg) const + { +gcc_assert (reg < m_nregs); +return m_regs[reg]; + } + + const char *get_stub_name (enum xlogue_stub stub, +unsigned n_extra_args) const; + /* Returns an rtx for the stub's symbol based upon + 1.) the specified stub (save, restore or restore_ret) and + 2.) the value of cfun->machine->call_ms2sysv_extra_regs and + 3.) rather or not stack alignment is being performed. */ + rtx get_stub_rtx (enum xlogue_stub stub) const; + + /* Returns the amount of stack space (including padding) that the stub + needs to store registers based upon data in the machine_function. */ + HOST_WIDE_INT get_stack_space_used () const + { +const struct machine_function &m = *cfun->machine; +unsigned last_reg = m.call_ms2sysv_extra_regs + MIN_REGS - 1; + +gcc_assert (m.call_ms2sysv_extra_regs <= MAX_EXTRA_REGS); +return m_regs[last_reg].offset + + (m.call_ms2sysv_pad_out ? 8 : 0) + + STUB_INDEX_OFFSET; + } + + /* Returns the offset for the base pointer used by the stub. */ + HOST_WIDE_INT get_stub_ptr_offset () const + { +return STUB_INDEX_OFFSET + m_stack_align_off_in; + } + + static const struct xlogue_layout &get_instance (); + static unsigned compute_stub_managed_regs (HARD_REG_SET &stub_managed_regs); + + static const HOST_WIDE_INT STUB_INDEX_OFFSET = 0x70; + static const unsigned MIN_REGS = NUM_X86_64_MS_CLOBBERED_REGS; + static const unsigned MAX_REGS = 18; + static const unsigned MAX_EXTRA_REGS = MAX_REGS - MIN_REGS; + static const unsigned VARIANT_COUNT = MAX_EXTRA_REGS + 1; + static const unsigned STUB_NAME_MAX_LEN = 16; + static const char * const STUB_BASE_NAMES[XLOGUE_STUB_COUNT]; + static const unsigned REG_ORDER[MAX_REGS]; + static const unsigned REG_ORDER_REALIGN[MAX_REGS]; + +private: + xlogue_layout (); + xlogue_layout (HOST_WIDE_INT stack_align_off_in, bool hfp); + xlogue_layout (const xlogue_layout &); + + /* True if hard frame pointer is used. */ + bool m_hfp; + + /* Max number of register this layout manages. */ + unsigned m_nregs; + + /* Incoming offset from 16-byte alignment. */ + HOST_WIDE_INT m_stack_align_off_in; + + /* Register order and offsets. */ + struct reginfo m_regs[MAX_REGS
[PATCH 07/12] [i386] Modify ix86_save_reg to optionally omit stub-managed registers
Add HARD_REG_SET stub_managed_regs to track which registers will be managed by the pro/epilogue stubs for the function. Add a third parameter bool ignore_outlined to ix86_save_reg to specify rather or not the count should include registers marked in stub_managed_regs. All call sites are modified. Signed-off-by: Daniel Santos --- gcc/config/i386/i386.c | 31 --- 1 file changed, 20 insertions(+), 11 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2da3da1f97a..4f0cb7dd6cc 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12618,6 +12618,10 @@ ix86_hard_regno_scratch_ok (unsigned int regno) && df_regs_ever_live_p (regno))); } +/* Registers who's save & restore will be managed by stubs called from + pro/epilogue. */ +static HARD_REG_SET GTY(()) stub_managed_regs; + /* Return true if register class CL should be an additional allocno class. */ @@ -12630,7 +12634,7 @@ ix86_additional_allocno_class_p (reg_class_t cl) /* Return TRUE if we need to save REGNO. */ static bool -ix86_save_reg (unsigned int regno, bool maybe_eh_return) +ix86_save_reg (unsigned int regno, bool maybe_eh_return, bool ignore_outlined) { /* If there are no caller-saved registers, we preserve all registers, except for MMX and x87 registers which aren't supported when saving @@ -12698,6 +12702,10 @@ ix86_save_reg (unsigned int regno, bool maybe_eh_return) } } + if (ignore_outlined && cfun->machine->call_ms2sysv + && in_hard_reg_set_p (stub_managed_regs, DImode, regno)) +return false; + if (crtl->drap_reg && regno == REGNO (crtl->drap_reg) && !cfun->machine->no_drap_save_restore) @@ -12718,7 +12726,7 @@ ix86_nsaved_regs (void) int regno; for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) -if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true)) +if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true, true)) nregs ++; return nregs; } @@ -12734,7 +12742,7 @@ ix86_nsaved_sseregs (void) if (!TARGET_64BIT_MS_ABI) return 0; for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) -if (SSE_REGNO_P (regno) && ix86_save_reg (regno, true)) +if (SSE_REGNO_P (regno) && ix86_save_reg (regno, true, true)) nregs ++; return nregs; } @@ -12814,6 +12822,7 @@ ix86_compute_frame_layout (struct ix86_frame *frame) frame->nregs = ix86_nsaved_regs (); frame->nsseregs = ix86_nsaved_sseregs (); + CLEAR_HARD_REG_SET (stub_managed_regs); /* 64-bit MS ABI seem to require stack alignment to be always 16, except for function prologues, leaf functions and when the defult @@ -13207,7 +13216,7 @@ ix86_emit_save_regs (void) rtx_insn *insn; for (regno = FIRST_PSEUDO_REGISTER - 1; regno-- > 0; ) -if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true)) +if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true, true)) { insn = emit_insn (gen_push (gen_rtx_REG (word_mode, regno))); RTX_FRAME_RELATED_P (insn) = 1; @@ -13297,7 +13306,7 @@ ix86_emit_save_regs_using_mov (HOST_WIDE_INT cfa_offset) unsigned int regno; for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) -if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true)) +if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true, true)) { ix86_emit_save_reg_using_mov (word_mode, regno, cfa_offset); cfa_offset -= UNITS_PER_WORD; @@ -13312,7 +13321,7 @@ ix86_emit_save_sse_regs_using_mov (HOST_WIDE_INT cfa_offset) unsigned int regno; for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) -if (SSE_REGNO_P (regno) && ix86_save_reg (regno, true)) +if (SSE_REGNO_P (regno) && ix86_save_reg (regno, true, true)) { ix86_emit_save_reg_using_mov (V4SFmode, regno, cfa_offset); cfa_offset -= GET_MODE_SIZE (V4SFmode); @@ -13696,13 +13705,13 @@ get_scratch_register_on_entry (struct scratch_reg *sr) && !static_chain_p && drap_regno != CX_REG) regno = CX_REG; - else if (ix86_save_reg (BX_REG, true)) + else if (ix86_save_reg (BX_REG, true, false)) regno = BX_REG; /* esi is the static chain register. */ else if (!(regparm == 3 && static_chain_p) - && ix86_save_reg (SI_REG, true)) + && ix86_save_reg (SI_REG, true, false)) regno = SI_REG; - else if (ix86_save_reg (DI_REG, true)) + else if (ix86_save_reg (DI_REG, true, false)) regno = DI_REG; else { @@ -14812,7 +14821,7 @@ ix86_emit_restore_regs_using_mov (HOST_WIDE_INT cfa_offset, unsigned int regno; for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) -if (GENE
[PATCH 11/12] [i386] Add remainder of -mcall-ms2sysv-xlogues implementation
Add functions emit_outlined_ms2sysv_save and emit_outlined_ms2sysv_restore, which are called from ix86_expand_prologue and ix86_expand_epilogue, respectively. Also adds the code to ix86_expand_call that enables the optimization (setting the machine_function's outline_ms_sysv field). Signed-off-by: Daniel Santos --- gcc/config/i386/i386.c | 281 +++-- 1 file changed, 272 insertions(+), 9 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index debfe457d97..6a4e6f8e728 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -14271,6 +14271,79 @@ ix86_elim_entry_set_got (rtx reg) } } +static rtx +gen_frame_set (rtx reg, rtx frame_reg, int offset, bool store) +{ + rtx addr, mem; + + if (offset) +addr = gen_rtx_PLUS (Pmode, frame_reg, GEN_INT (offset)); + mem = gen_frame_mem (GET_MODE (reg), offset ? addr : frame_reg); + return gen_rtx_SET (store ? mem : reg, store ? reg : mem); +} + +static inline rtx +gen_frame_load (rtx reg, rtx frame_reg, int offset) +{ + return gen_frame_set (reg, frame_reg, offset, false); +} + +static inline rtx +gen_frame_store (rtx reg, rtx frame_reg, int offset) +{ + return gen_frame_set (reg, frame_reg, offset, true); +} + +static void +ix86_emit_outlined_ms2sysv_save (const struct ix86_frame &frame) +{ + struct machine_function *m = cfun->machine; + const unsigned ncregs = NUM_X86_64_MS_CLOBBERED_REGS + + m->call_ms2sysv_extra_regs; + rtvec v = rtvec_alloc (ncregs - 1 + 3); + unsigned int align, i, vi = 0; + rtx_insn *insn; + rtx sym, addr; + rtx rax = gen_rtx_REG (word_mode, AX_REG); + const struct xlogue_layout &xlogue = xlogue_layout::get_instance (); + HOST_WIDE_INT rax_offset = xlogue.get_stub_ptr_offset () + m->fs.sp_offset; + HOST_WIDE_INT stack_alloc_size = frame.stack_pointer_offset - m->fs.sp_offset; + HOST_WIDE_INT stack_align_off_in = xlogue.get_stack_align_off_in (); + + /* Verify that the incoming stack 16-byte alignment offset matches the + layout we're using. */ + gcc_assert (stack_align_off_in == (m->fs.sp_offset & UNITS_PER_WORD)); + + /* Get the stub symbol. */ + sym = xlogue.get_stub_rtx (frame_pointer_needed ? XLOGUE_STUB_SAVE_HFP + : XLOGUE_STUB_SAVE); + RTVEC_ELT (v, vi++) = gen_rtx_USE (VOIDmode, sym); + RTVEC_ELT (v, vi++) = const0_rtx; + + /* Setup RAX as the stub's base pointer. */ + align = GET_MODE_ALIGNMENT (V4SFmode); + addr = choose_baseaddr (rax_offset, &align); + gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode)); + insn = emit_insn (gen_rtx_SET (rax, addr)); + + gcc_assert (stack_alloc_size >= xlogue.get_stack_space_used ()); + pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, +GEN_INT (-stack_alloc_size), -1, +m->fs.cfa_reg == stack_pointer_rtx); + for (i = 0; i < ncregs; ++i) +{ + const xlogue_layout::reginfo &r = xlogue.get_reginfo (i); + rtx reg = gen_rtx_REG ((SSE_REGNO_P (r.regno) ? V4SFmode : word_mode), +r.regno); + RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);; +} + + gcc_assert (vi == (unsigned)GET_NUM_ELEM (v)); + + insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, v)); + RTX_FRAME_RELATED_P (insn) = true; +} + /* Expand the prologue into a bunch of separate insns. */ void @@ -14518,7 +14591,7 @@ ix86_expand_prologue (void) performing the actual alignment. Otherwise we cannot guarantee that there's enough storage above the realignment point. */ allocate = frame.stack_realign_allocate_offset - m->fs.sp_offset; - if (allocate) + if (allocate && !m->call_ms2sysv) pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, GEN_INT (-allocate), -1, false); @@ -14526,7 +14599,6 @@ ix86_expand_prologue (void) insn = emit_insn (ix86_gen_andsp (stack_pointer_rtx, stack_pointer_rtx, GEN_INT (-align_bytes))); - /* For the purposes of register save area addressing, the stack pointer can no longer be used to access anything in the frame below m->fs.sp_realigned_offset and the frame pointer cannot be @@ -14543,6 +14615,9 @@ ix86_expand_prologue (void) m->fs.sp_valid = false; } + if (m->call_ms2sysv) +ix86_emit_outlined_ms2sysv_save (frame); + allocate = frame.stack_pointer_offset - m->fs.sp_offset; if (flag_stack_usage_info) @@ -14863,17 +14938,19 @@ ix86_emit_restore_regs_using_pop (void) unsigned int regno; for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) -if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, false)) +if (GENERAL_REGNO_P (regno) &