Hi,
currently we have somewhat non-sential setting for accumulate-ougoing-args.
It is disabled for Intel chips because recent chips do have stack engines making
push/pop instructions cheap, it is however enabled for AMD chips and Generic.

Originally accumulation was disabled since push/pop instructions was expensive
on PentiumPro-Pentium4 and K6-K8 CPUS that did not have useful stack engines.
This reason is now gone.

There are still pros and cons of arg accumulation.  I did quite extensive
testing on AMD chips and found it performance neutral.  On 32bit code it saves
about 4% of code but with frame pointer disabled it expans unwind info quite a
lot, so resulting binary is about 8% bigger. (This is also current default for 
-Os)

I think we generally prefer code segment size reduction over EH frame, so we
should flip the default (or disable it for cores if we decide otherwise).

This patch disables accumulation by default.  I intend to commit it once the
bootstrap PR on unwind info is resolved if there is no significant oposition
for doing so. I will also update release note explaining the code size effect.

It would be great to get a heuristic enabling frame pointer for functions
where doing so reduces code size without performance regressions.  I think it
is quite commonly the case.

Bootstrapped/regtested x86_64-linux

Honza

        * config/i38/x86-tune.def: Disable X86_TUNE_ACCUMULATE_OUTGOING_ARGS
        for generic and recent AMD chips
Index: config/i386/x86-tune.def
===================================================================
--- config/i386/x86-tune.def    (revision 206233)
+++ config/i386/x86-tune.def    (working copy)
@@ -143,7 +143,7 @@ DEF_TUNE (X86_TUNE_REASSOC_FP_TO_PARALLE
    regression on mgrid due to IRA limitation leading to unecessary
    use of the frame pointer in 32bit mode.  */
 DEF_TUNE (X86_TUNE_ACCUMULATE_OUTGOING_ARGS, "accumulate_outgoing_args",
-         m_PPRO | m_P4_NOCONA | m_BONNELL | m_SILVERMONT | m_AMD_MULTIPLE | 
m_GENERIC)
+         m_PPRO | m_P4_NOCONA | m_BONNELL | m_SILVERMONT | m_ATHLON_K8)
 
 /* X86_TUNE_PROLOGUE_USING_MOVE: Do not use push/pop in prologues that are
    considered on critical path.  */

Reply via email to