https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102953

peterz at infradead dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |peterz at infradead dot org

--- Comment #5 from peterz at infradead dot org ---

(In reply to H.J. Lu from comment #1)
> (In reply to Andrew Cooper from comment #0)
> > 
> > Finally, one minor code generation improvement.  When GCC emits a direct
> > call/jmp to an ENDBR'd symbol, it can actually use sym+4 as an optimisation
> > to skip the ENDBR instruction (not needed for direct call/jmp's) and save on
> > decode bandwidth.
> 
> It is only safe for calling a static function whose address has been taken.

Could be done wider using LTO, or pushed into the linker if you're willing to
change the ELF format and augment it with STT_FUNC_BTI and R_*_BTI32, there the
new relocation would mean +4 (or rather, skip landing pad) when aimed at
STT_FUNC_BTI and be identical to _PC32 otherwise.

The ELF thing doesn't help with reducing ENDBR emissions for global symbols
since we can't ever tell who will take the address, but it will help with
directly calling avoiding the landing pad.

It also allows for a 'pseudo' LTO thing to 'seal' an executable, where it will
strip the ENDBRs for all symbols that do not have a _PC32 rela. We can use
EXPORT_SYMBOL*() to generate one such on purpose to avoid exported functions
from being sealed.

Anyway, I think there's much value in reducing the number of ENDBR instructions
as much as possible and we should investigate how the toolchains can best help
there.

Reply via email to