> -----Original Message-----
> From: Segher Boessenkool <seg...@kernel.crashing.org>
> Sent: Friday, July 4, 2025 9:21 PM
> To: Cui, Lili <lili....@intel.com>
> Cc: ubiz...@gmail.com; gcc-patches@gcc.gnu.org; Liu, Hongtao
> <hongtao....@intel.com>; richard.guent...@gmail.com; Michael Matz
> <m...@suse.de>
> Subject: Re: [PATCH V3] x86: Enable separate shrink wrapping
> 
> Hi!
> 
> On Fri, Jul 04, 2025 at 07:23:23AM +0000, Cui, Lili wrote:
> > > > Initially, I looked at other architectures and disabled the hard
> > > > frame pointer,
> > >
> > > Like aarch?  Yeah I always wondered why they don't do it.  I decided
> > > that that is because of their ABI and architecture stuff they can
> > > save and restore their frame reg (r29) with the same insn as they
> > > use for the link reg (r30).  Of course they could do code to do
> > > tradeoffs there, but apparently they did no see the use for that, or
> > > perhaps from experience knew what way this would fall in the end.
> > >
> >
> > Loongarch/rs6000/riscv/aarch64 all disable
> HARD_FRAME_POINTER_REGNUM.
> 
> rs6000 does not *have* a hard frame pointer!
> 

Oh, I see.  The handling of HARD_FRAME_POINTER_REGNUM seems redundant for 
rs6000.

rs6000_get_separate_components (void)
{
...

  /* Don't mess with the hard frame pointer.  */
  if (frame_pointer_needed)
    bitmap_clear_bit (components, HARD_FRAME_POINTER_REGNUM);
...

> Generic parts of GCC require a frame pointer to exist, so when people require
> -fno-omit-frame-pointer we dedicate some GPR to it.  Gotta do something,
> eh!  It costs about 2% performance (on average, not worst
> case!)
> 
> The other archs you mention copied their code from aarch.
> 
> > > > but after reconsidering, I realized your point makes sense. If the
> > > > hard frame pointer were enabled,  we would typically emit push
> > > > %rbp and mov %rsp, %rbp at the first of prologue,  there is no
> > > > room for separate shrink wrap, but if the function itself also use
> > > > rbp, there might be room for optimization,
> > >
> > > Yup, when using a frame pointer (hard or otherwise, and a very bad
> > > plan nowadays, a 1970's thing) you typically get the frame pointer
> > > established very first thing, anything that touches the frame needs it 
> > > after
> all!
> > >
> > > But not all code accesses the frame, many early-out paths do not for
> > > example.
> > >
> >
> > Yes, currently we do shrink-wrap for the entire prologue (including the
> HARD_FRAME_POINTER), it can solve some early return issues. But we can't
> do separate-shrink-wrap for HARD_FRAME_POINTER, because
> HARD_FRAME_POINTER needs to record rsp before rsp points to the bottom
> of stack. We have to put it at the beginning of the prologue, and we have no
> chance to shrink it individually.
> 
> I'm not sure what things you mean here.
> 
> In most ABIs (yours as well I think?) the frame of a function is pointed to by
> the stack pointer at function entry, in normal functions.  "Happy functions" 
> :-)
> 

Yes, in a normal function it would be placed at entry bb, but shrink-wrap might 
move the whole prologue to after early return. X86 puts the frame pointer and 
the prologue together. For some early return situations, shrink-wrap can avoid 
the frame point being executed.

> You can set a pseudo to the stack pointer at function entry and then (either 
> or
> not) copy that to the frame pointer later, or let things be optimised away.
> 
> > I removed these two lines of code and conducted a comparison test,  and
> found that the binary unchanged. Unfortunately, I didn't identify any
> opportunities for optimization, I think it's better to keep them. Not sure if
> there might be any corner case issues.
> 
> For most archs and ABIs it is very beneficial to use -fomit-frame-pointer, I
> thought that was true for x86 even?  There is a special reg for it, sure, but 
> you
> can use that reg as a general reg as well, and that is way useful on an arch
> with so few registers :-)

Yes, -fomit-frame-pointer does help performance.  Here is a simple small case 
https://godbolt.org/z/5Tc3jM7qc . Do you mean to optimize the %rbp here?

Thanks,
Lili.
> 
> Segher

Reply via email to