Uros or Jan,
Please take this as a ping, as I never bothered pinging after submitting v2 since I found a few more issues with it. :) Although I realize this would be a GCC 8 stage 1 item, I would like to try to get it finished up and tentatively approved asap. I have tried to summarize this patch set as clearly and succinctly below as possible. Thanks!

 * This patch set depends upon the "Use aligned SSE movs for re-aligned
   MS ABI pro/epilogues" patch set:
   https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01859.html
 * I have submitted a test program submitted separately:
   https://gcc.gnu.org/ml/gcc/2017-02/msg00009.html


Summary
=======

When a 64-bit Microsoft function calls and System V function, ABI differences requires RSI, RDI and XMM6-15 to be considered as clobbered. Saving these registers inline can cost as much as 109 bytes and a similar amount for restoring. This patch set targets 64-bit Wine and aims to mitigate some of these costs by adding ms/sysv save & restore stubs to libgcc, which are called from pro/epilogues rather than emitting the code inline. And since we're already tinkering with stubs, they will also manages the save/restore of all remaining registers if possible. Analysis of building Wine 64 demonstrates a reduction of .text by around 20%, which also translates into a reduction of Wine's install size by 34MiB.

As there will usually only be 3 stubs in memory at any time, I'm using the larger mov instructions instead of push/pop to facilitate better parallelization. The basic theory is that the combination of better parallelization and reduced I-cache misses will offset the extra instructions required for implementation, although I have not produced actual performance data yet.

For now, I have called this feature -moutline-msabi-xlogues, but Sandra Loosemore has this suggestion: (https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02670.html)

Just as a suggestion (I'm not an i386 maintainer), I'd recommend
spelling the name of this option -mno-inline-msabi-xlogues instead of
-moutline-msabi-xlogues, and making the default -minline-msabi-xlogues.

When enabled, the feature is activated when an ms_abi function calls a sysv_abi function if the following is true (evaluated in ix86_compute_frame_layout):

    TARGET_SSE
    && !ix86_function_ms_hook_prologue (current_function_decl)
    && !SEH
    && !crtl->calls_eh_return
    && !ix86_static_chain_on_stack
    && !ix86_using_red_zone ()
    && !flag_split_stack

Some of these, like __builtin_eh_return, might be easy to add but I don't have a test for them.


StackLayout
============

When active, registers are saved on the stack differently. Note that when not active, stack layout is *unchanged*.

    [arguments]
                            <- ARG_POINTER
    saved pc

    saved frame pointer     if frame_pointer_needed
                            <- HARD_FRAME_POINTER
[saved regs] if not managed by stub, (e.g. explicitly clobbered)
                            <- reg_save_offset
    [padding0]
                            <- stack_realign_offset
                            <- Start of out-of-line, stub-managed regs
    XMM6-15
    RSI
    RDI
    [RBX]                   if RBX is clobbered
    [RBP]                   if RBP and RBX are clobbered and HFP not used.
    [R12]                   if R12 and all previous regs are clobbered
    [R13]                   if R13 and all previous regs are clobbered
    [R14]                   if R14 and all previous regs are clobbered
    [R15]                   if R15 and all previous regs are clobbered
                            <- end of stub-saved/restored regs
    [padding1]
                            <- outlined_save_offset
                            <- sse_regs_save_offset
    [padding2]
                            <- FRAME_POINTER
    [va_arg registers]

    [frame]
    ... etc.


Stubs
=====

There are two sets of stubs for use with and without hard frame pointers. Each set has a save, a restore and a restore-as-tail-call that performs the function's return. Each stub has entry points for the number of registers it's saving. The non-tail-call restore is used when a sibling call is the tail. If a normal register is explicitly clobbered out of the order that hard registers are usually assigned in (e.g., __asm__ __volatile__ ("":::"r15")), then that register will be saved and restored as normal and not by the stub.

Stub names:
__savms64_(12-18)
__resms64_(12-18)
__resms64x_(12-18)

__savms64f_(12-17)
__resms64f_(12-17)
__resms64fx_(12-17)

Save stubs use RAX as a base register and restore stubs use RSI, the later which is overwritten before returning. Restore-as-tail-call for the non-HFP case uses R10 to restore the stack pointer before returning.

Samples
=======

Standard case with RBX, RBP and R12 also being used in function:

  Prologue:
    lea    -0x78(%rsp),%rax
    sub    $0x108,%rsp
    callq  5874b <__savms64_15>

  Epilogue (r10 stores the value to restore the stack pointer to):
    lea    0x90(%rsp),%rsi
    lea    0x78(%rsi),%r10
    jmpq   587eb <__resms64x_15>

Stack pointer realignment case (same clobbers):

  Prologue, stack realignment case:
    push   %rbp
    mov    %rsp,%rbp
    and    $0xfffffffffffffff0,%rsp
    lea    -0x70(%rsp),%rax
    sub    $0x100,%rsp
    callq  57fc7 <__savms64f_15>

  Epilogue, stack realignment case:
    lea    0x90(%rsp),%rsi
    jmpq   58013 <__resms64fx_15>


Testing
=======

A comprehensive test program is submitted separately with no additional tests failing. I have also run Wine's tests with no additional failures (although a few very minor tweaks have gone in since I last ran Wine's tests). I have not run -flto tests on Wine as I haven't yet found a way to Wine to build with -flto, maybe I'm just doing it wrong.

gcc/config/i386/i386.c | 691 ++++++++++++++++++++++++++++++++++++++---
 gcc/config/i386/i386.h         |  22 +-
 gcc/config/i386/i386.opt       |   5 +
 gcc/config/i386/predicates.md  | 155 +++++++++
 gcc/config/i386/sse.md         |  37 +++
 gcc/doc/invoke.texi            |  11 +-
 libgcc/config.host             |   2 +-
 libgcc/config/i386/i386-asm.h  |  82 +++++
 libgcc/config/i386/resms64.S   |  57 ++++
 libgcc/config/i386/resms64f.S  |  55 ++++
 libgcc/config/i386/resms64fx.S |  57 ++++
 libgcc/config/i386/resms64x.S  |  59 ++++
 libgcc/config/i386/savms64.S   |  57 ++++
 libgcc/config/i386/savms64f.S  |  55 ++++
 libgcc/config/i386/t-msabi     |   7 +
 15 files changed, 1314 insertions(+), 38 deletions(-)


Daniel Santos

Reply via email to