Uros or Jan,
Please take this as a ping, as I never bothered pinging after submitting
v2 since I found a few more issues with it. :) Although I realize this
would be a GCC 8 stage 1 item, I would like to try to get it finished up
and tentatively approved asap. I have tried to summarize this patch set
as clearly and succinctly below as possible. Thanks!
* This patch set depends upon the "Use aligned SSE movs for re-aligned
MS ABI pro/epilogues" patch set:
https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01859.html
* I have submitted a test program submitted separately:
https://gcc.gnu.org/ml/gcc/2017-02/msg00009.html
Summary
=======
When a 64-bit Microsoft function calls and System V function, ABI
differences requires RSI, RDI and XMM6-15 to be considered as
clobbered. Saving these registers inline can cost as much as 109 bytes
and a similar amount for restoring. This patch set targets 64-bit Wine
and aims to mitigate some of these costs by adding ms/sysv save &
restore stubs to libgcc, which are called from pro/epilogues rather than
emitting the code inline. And since we're already tinkering with stubs,
they will also manages the save/restore of all remaining registers if
possible. Analysis of building Wine 64 demonstrates a reduction of
.text by around 20%, which also translates into a reduction of Wine's
install size by 34MiB.
As there will usually only be 3 stubs in memory at any time, I'm using
the larger mov instructions instead of push/pop to facilitate better
parallelization. The basic theory is that the combination of better
parallelization and reduced I-cache misses will offset the extra
instructions required for implementation, although I have not produced
actual performance data yet.
For now, I have called this feature -moutline-msabi-xlogues, but Sandra
Loosemore has this suggestion:
(https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02670.html)
Just as a suggestion (I'm not an i386 maintainer), I'd recommend
spelling the name of this option -mno-inline-msabi-xlogues instead of
-moutline-msabi-xlogues, and making the default -minline-msabi-xlogues.
When enabled, the feature is activated when an ms_abi function calls a
sysv_abi function if the following is true (evaluated in
ix86_compute_frame_layout):
TARGET_SSE
&& !ix86_function_ms_hook_prologue (current_function_decl)
&& !SEH
&& !crtl->calls_eh_return
&& !ix86_static_chain_on_stack
&& !ix86_using_red_zone ()
&& !flag_split_stack
Some of these, like __builtin_eh_return, might be easy to add but I
don't have a test for them.
StackLayout
============
When active, registers are saved on the stack differently. Note that
when not active, stack layout is *unchanged*.
[arguments]
<- ARG_POINTER
saved pc
saved frame pointer if frame_pointer_needed
<- HARD_FRAME_POINTER
[saved regs] if not managed by stub, (e.g. explicitly
clobbered)
<- reg_save_offset
[padding0]
<- stack_realign_offset
<- Start of out-of-line, stub-managed regs
XMM6-15
RSI
RDI
[RBX] if RBX is clobbered
[RBP] if RBP and RBX are clobbered and HFP not used.
[R12] if R12 and all previous regs are clobbered
[R13] if R13 and all previous regs are clobbered
[R14] if R14 and all previous regs are clobbered
[R15] if R15 and all previous regs are clobbered
<- end of stub-saved/restored regs
[padding1]
<- outlined_save_offset
<- sse_regs_save_offset
[padding2]
<- FRAME_POINTER
[va_arg registers]
[frame]
... etc.
Stubs
=====
There are two sets of stubs for use with and without hard frame
pointers. Each set has a save, a restore and a restore-as-tail-call
that performs the function's return. Each stub has entry points for the
number of registers it's saving. The non-tail-call restore is used when
a sibling call is the tail. If a normal register is explicitly
clobbered out of the order that hard registers are usually assigned in
(e.g., __asm__ __volatile__ ("":::"r15")), then that register will be
saved and restored as normal and not by the stub.
Stub names:
__savms64_(12-18)
__resms64_(12-18)
__resms64x_(12-18)
__savms64f_(12-17)
__resms64f_(12-17)
__resms64fx_(12-17)
Save stubs use RAX as a base register and restore stubs use RSI, the
later which is overwritten before returning. Restore-as-tail-call for
the non-HFP case uses R10 to restore the stack pointer before returning.
Samples
=======
Standard case with RBX, RBP and R12 also being used in function:
Prologue:
lea -0x78(%rsp),%rax
sub $0x108,%rsp
callq 5874b <__savms64_15>
Epilogue (r10 stores the value to restore the stack pointer to):
lea 0x90(%rsp),%rsi
lea 0x78(%rsi),%r10
jmpq 587eb <__resms64x_15>
Stack pointer realignment case (same clobbers):
Prologue, stack realignment case:
push %rbp
mov %rsp,%rbp
and $0xfffffffffffffff0,%rsp
lea -0x70(%rsp),%rax
sub $0x100,%rsp
callq 57fc7 <__savms64f_15>
Epilogue, stack realignment case:
lea 0x90(%rsp),%rsi
jmpq 58013 <__resms64fx_15>
Testing
=======
A comprehensive test program is submitted separately with no additional
tests failing. I have also run Wine's tests with no additional failures
(although a few very minor tweaks have gone in since I last ran Wine's
tests). I have not run -flto tests on Wine as I haven't yet found a way
to Wine to build with -flto, maybe I'm just doing it wrong.
gcc/config/i386/i386.c | 691
++++++++++++++++++++++++++++++++++++++---
gcc/config/i386/i386.h | 22 +-
gcc/config/i386/i386.opt | 5 +
gcc/config/i386/predicates.md | 155 +++++++++
gcc/config/i386/sse.md | 37 +++
gcc/doc/invoke.texi | 11 +-
libgcc/config.host | 2 +-
libgcc/config/i386/i386-asm.h | 82 +++++
libgcc/config/i386/resms64.S | 57 ++++
libgcc/config/i386/resms64f.S | 55 ++++
libgcc/config/i386/resms64fx.S | 57 ++++
libgcc/config/i386/resms64x.S | 59 ++++
libgcc/config/i386/savms64.S | 57 ++++
libgcc/config/i386/savms64f.S | 55 ++++
libgcc/config/i386/t-msabi | 7 +
15 files changed, 1314 insertions(+), 38 deletions(-)
Daniel Santos