On Tue, Jun 17, 2025 at 10:33 PM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Tue, Jun 17, 2025 at 4:03 PM Cui, Lili <lili....@intel.com> wrote:
> >
> > From: Lili Cui <lili....@intel.com>
> >
> > Hi Uros,
> >
> > This is patch v3, the main changes are as follows.
> >
> > 1. Added a pro_epilogue_adjust_stack_add_nocc in i386.md to add memory 
> > clobber for lea/mov.
> > 2. Adjusted some formatting issues.
> > 3. Added scan-rtl-dumps for ia32 in shrink_wrap_separate.C.
> >
> > Collected spec2017 performance on ZNVER5, EMR and ICELAKE. No performance 
> > regression was observed.
> > For O2 multi-copy :
> > 511.povray_r improved by 2.8% on ZNVER5.
> > 511.povray_r improved by 4.2% on EMR
> >
> > Bootstrapped & regtested on x86-64-pc-linux-gnu.
> > Use this patch to build the latest Linux kernel and boot successfully.
> >
> > Thanks,
> > Lili.
> >
> >
> > This commit implements the target macros (TARGET_SHRINK_WRAP_*) that
> > enable separate shrink wrapping for function prologues/epilogues in
> > x86.
> >
> > When performing separate shrink wrapping, we choose to use mov instead
> > of push/pop, because using push/pop is more complicated to handle rsp
> > adjustment and may lose performance, so here we choose to use mov, which
> > has a small impact on code size, but guarantees performance.
> >
> > Using mov means we need to use sub/add to maintain the stack frame. In
> > some special cases, we need to use lea to prevent affecting EFlags.
> >
> > Avoid inserting sub between test-je-jle to change EFlags, lea should be
> > used here.
> >
> >     foo:
> >         xorl    %eax, %eax
> >         testl   %edi, %edi
> >         je      .L11
> >         sub     $16, %rsp  ------> leaq    -16(%rsp), %rsp
> >         movq    %r13, 8(%rsp)
> >         movl    $1, %r13d
> >         jle     .L4
> >
> > Tested against SPEC CPU 2017, this change always has a net-positive
> > effect on the dynamic instruction count.  See the following table for
> > the breakdown on how this reduces the number of dynamic instructions
> > per workload on a like-for-like (with/without this commit):
> >
> > instruction count       base            with commit (commit-base)/commit
> > 502.gcc_r               98666845943     96891561634     -1.80%
> > 526.blender_r           6.21226E+11     6.12992E+11     -1.33%
> > 520.omnetpp_r           1.1241E+11      1.11093E+11     -1.17%
> > 500.perlbench_r         1271558717      1263268350      -0.65%
> > 523.xalancbmk_r         2.20103E+11     2.18836E+11     -0.58%
> > 531.deepsjeng_r         2.73591E+11     2.72114E+11     -0.54%
> > 500.perlbench_r         64195557393     63881512409     -0.49%
> > 541.leela_r             2.99097E+11     2.98245E+11     -0.29%
> > 548.exchange2_r         1.27976E+11     1.27784E+11     -0.15%
> > 527.cam4_r              88981458425     88887334679     -0.11%
> > 554.roms_r              2.60072E+11     2.59809E+11     -0.10%
> >
> > Collected spec2017 performance on ZNVER5, EMR and ICELAKE. No performance 
> > regression was observed.
> >
> > For O2 multi-copy :
> > 511.povray_r improved by 2.8% on ZNVER5.
> > 511.povray_r improved by 4% on EMR
> > 511.povray_r improved by 3.3 % ~ 4.6% on ICELAKE.
> >
> > gcc/ChangeLog:
> >
> >         * config/i386/i386-protos.h (ix86_get_separate_components):
> >         New function.
> >         (ix86_components_for_bb): Likewise.
> >         (ix86_disqualify_components): Likewise.
> >         (ix86_emit_prologue_components): Likewise.
> >         (ix86_emit_epilogue_components): Likewise.
> >         (ix86_set_handled_components): Likewise.
> >         * config/i386/i386.cc (save_regs_using_push_pop):
> >         Split from ix86_compute_frame_layout.
> >         (ix86_compute_frame_layout):
> >         Use save_regs_using_push_pop.
> >         (pro_epilogue_adjust_stack):
> >         Use gen_pro_epilogue_adjust_stack_add_nocc.
> >         (ix86_expand_prologue): Add some assertions and adjust
> >         the stack frame at the beginning of the prolog for shrink
> >         wrapping separate.
> >         (ix86_emit_save_regs_using_mov):
> >         Skip registers that are wrapped separately.
> >         (ix86_emit_restore_regs_using_mov): Likewise.
> >         (ix86_expand_epilogue): Add some assertions and set
> >         restore_regs_via_mov to true for shrink wrapping separate.
> >         (ix86_get_separate_components): New function.
> >         (ix86_components_for_bb): Likewise.
> >         (ix86_disqualify_components): Likewise.
> >         (ix86_emit_prologue_components): Likewise.
> >         (ix86_emit_epilogue_components): Likewise.
> >         (ix86_set_handled_components): Likewise.
> >         (TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS): Define.
> >         (TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB): Likewise.
> >         (TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS): Likewise.
> >         (TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS): Likewise.
> >         (TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS): Likewise.
> >         (TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Likewise.
> >         * config/i386/i386.h (struct machine_function):Add
> >         reg_is_wrapped_separately array for register wrapping
> >         information.
> >         * config/i386/i386.md
> >         (@pro_epilogue_adjust_stack_add_nocc<mode>): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/x86_64/abi/callabi/leaf-2.c: Adjust the test.
> >         * gcc.target/i386/interrupt-16.c: Likewise.
> >         * gfortran.dg/guality/arg1.f90: Likewise.
> >         * gcc.target/i386/avx10_2-comibf-1.c: Likewise.
> >         * g++.target/i386/shrink_wrap_separate.C: New test.
> >         * gcc.target/i386/shrink_wrap_separate_check_lea.c: Likewise.
> >
> > Co-authored-by: Michael Matz <m...@suse.de>
>
> OK.
>

I checked in this patch to support x32.
From a0ee0607a6fc5b065c6472c978cc2bc90e7f07c3 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Mon, 30 Jun 2025 04:11:13 +0800
Subject: [PATCH] shrink_wrap_separate_check_lea.c: Scan lea(l|q)

Scan "lea(l|q)", instead of "leaq", to support x32.

	* gcc.target/i386/shrink_wrap_separate_check_lea.c: Scan
	"lea(l|q)", instead of "leaq".

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
---
 gcc/testsuite/gcc.target/i386/shrink_wrap_separate_check_lea.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/shrink_wrap_separate_check_lea.c b/gcc/testsuite/gcc.target/i386/shrink_wrap_separate_check_lea.c
index 0f2449f68b6..d61de57092c 100644
--- a/gcc/testsuite/gcc.target/i386/shrink_wrap_separate_check_lea.c
+++ b/gcc/testsuite/gcc.target/i386/shrink_wrap_separate_check_lea.c
@@ -26,4 +26,4 @@ int foo (int num)
   return 1;
 }
 /* { dg-final { scan-rtl-dump "The components we wrap separately are \\\[sep 40\\\]" "pro_and_epilogue" } } */
-/* { dg-final { scan-assembler "leaq.*(%rsp)" } } */
+/* { dg-final { scan-assembler "lea(l|q).*(%rsp)" } } */
-- 
2.50.0

Reply via email to