On Thu, Jun 19, 2025 at 9:37 AM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Wed, Jun 18, 2025 at 4:12 PM Cui, Lili <lili....@intel.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Uros Bizjak <ubiz...@gmail.com>
> > > Sent: Wednesday, June 18, 2025 9:22 PM
> > > To: Cui, Lili <lili....@intel.com>
> > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao <hongtao....@intel.com>;
> > > hongjiu...@intel.com
> > > Subject: Re: [PATCH] x86: Fix shrink wrap separate ICE under 
> > > -fstack-clash-
> > > protection [PR120697]
> > >
> > > On Wed, Jun 18, 2025 at 3:11 PM Cui, Lili <lili....@intel.com> wrote:
> > > >
> > > > From: Lili Cui <lili....@intel.com>
> > > >
> > > > Hi Uros,
> > > >
> > > > An assertion I added in shrink wrap separate V2 reports ICE when 
> > > > -fstack-
> > > clash-protection is enabled. The assertion should not be added here.
> > > >
> > > > I created a patch to remove 3 assertions and their associated code.
> > > >
> > > > 1. Reproduced PR120697 issue and solved the issue with this patch, also
> > > added a new test case for -fstack-clash-protection.
> > > > 2. Recollected performance data with -fstack-clash-protection, there is 
> > > > no
> > > ICE and regressions.
> > > > 3. Use this patch to build the latest Linux kernel and boot 
> > > > successfully.
> > > > 4. Bootstrapped & regtested on x86-64-pc-linux-gnu.
> > > >
> > > > Ok for master?
> > > >
> > > > Thanks,
> > > > Lili.
> > > >
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >         PR target/120697
> > > >         * config/i386/i386.cc (ix86_expand_prologue):
> > > >         Remove 3 assertions and associated code.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >         PR target/120697
> > > >         * gcc.target/i386/stack-clash-protection.c: New test.
> > >
> > > LGTM, but only in the sense that you are reverting parts of the original 
> > > patch.
> > >
> >
> > Sure, this fix simply removes some code in shrink-wrapped separate patch.  
> > Once Sam has verified the patch in his environment, I will commit the patch.
>
> There is a small inconsistency in your patch that escaped my review:
> you should set the type of the new instruction, something like:
>
> -  [(set (attr "length_immediate")
> +  [(set (attr "type")
> +    (cond [(match_operand:<MODE> 2 "const0_operand")
> +         (const_string "imov")
> +          ]
> +          (const_string "lea")))
> +   (set (attr "length_immediate")

I am testing the attached patch that fixes the above omission and goes
a couple of steps further. The patched compiler always emits
clobber-less stack adjustment, that is later conditionally peephole2'd
to a clobbered version (if flags reg is dead at the location).

Please note that according to several comments, sprinkled in i386.md,
ADD is faster than LEA for most processors. OTOH, TARGET_OPT_AGU
prefers LEA, and this is the reason existing
pro_epilogue_adjust_stack_add_<mode> avoids ALU insn type. The
attached patch does not change this tuning, even when clobber-less
insn is converted to a clobbered one.

There are also several peephole2 patterns available that convert
prologue esp subtractions to pushes (at the end of i386.md). These
process only patterns with flags reg clobber, so they are ineffective
with clobber-less stack adjustments. Luckily, peephole2s are
"sequential", they also process just peephole2'd patterns, so the
attached patch also restores previous functionality.

Uros.
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 77853297a2f..191bed4f3ab 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -7764,12 +7764,8 @@ pro_epilogue_adjust_stack (rtx dest, rtx src, rtx offset,
 
   /*  Shrink wrap separate may insert prologue between TEST and JMP.  In order
       not to affect EFlags, emit add without reg clobbering.  */
-  if (crtl->shrink_wrapped_separate)
-    insn = emit_insn (gen_pro_epilogue_adjust_stack_add_nocc
-                     (Pmode, dest, src, addend));
-  else
-    insn = emit_insn (gen_pro_epilogue_adjust_stack_add
-                     (Pmode, dest, src, addend));
+  insn = emit_insn (gen_pro_epilogue_adjust_stack_add_nocc
+                   (Pmode, dest, src, addend));
 
   if (style >= 0)
     ix86_add_queued_cfa_restore_notes (insn);
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 6bd557431f5..4bdd07b2e2f 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -27409,7 +27409,7 @@ (define_peephole2
 ;;
 ;; in proper program order.
 
-(define_insn "@pro_epilogue_adjust_stack_add_<mode>"
+(define_insn "*pro_epilogue_adjust_stack_add_<mode>"
   [(set (match_operand:P 0 "register_operand" "=r,r")
        (plus:P (match_operand:P 1 "register_operand" "0,r")
                (match_operand:P 2 "<nonmemory_operand>" "r<i>,l<i>")))
@@ -27467,13 +27467,31 @@ (define_insn 
"@pro_epilogue_adjust_stack_add_nocc<mode>"
       return "lea{<imodesuffix>}\t{%E2, %0|%0, %E2}";
     }
 }
-  [(set (attr "length_immediate")
+  [(set (attr "type")
+       (cond [(match_operand:<MODE> 2 "const0_operand")
+                (const_string "imov")
+             ]
+             (const_string "lea")))
+   (set (attr "length_immediate")
        (cond [(eq_attr "type" "imov")
                 (const_string "0")
              ]
              (const_string "*")))
    (set_attr "mode" "<MODE>")])
 
+(define_peephole2
+  [(parallel
+     [(set (match_operand:P 0 "register_operand")
+          (plus:P (match_dup 0)
+                  (match_operand:P 1 "<nonmemory_operand>")))
+      (clobber (mem:BLK (scratch)))])]
+  "peep2_regno_dead_p (0, FLAGS_REG)"
+  [(parallel
+     [(set (match_dup 0)
+          (plus:P (match_dup 0) (match_dup 1)))
+      (clobber (reg:CC FLAGS_REG))
+      (clobber (mem:BLK (scratch)))])])
+
 (define_insn "@pro_epilogue_adjust_stack_sub_<mode>"
   [(set (match_operand:P 0 "register_operand" "=r")
        (minus:P (match_operand:P 1 "register_operand" "0")

Reply via email to