[Bug middle-end/110724] Unnecessary alignment on branch to unconditional branch targets

javier.martinez.bugzilla at gmail dot com via Gcc-bugs Tue, 18 Jul 2023 12:07:45 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724


--- Comment #3 from Javier Martinez <javier.martinez.bugzilla at gmail dot com> 
---
The generic tuning of 16:11:8 looks reasonable to me, I do not argue against
it.

  From Anger Fog’s Optimizing subroutines in assembly language:
 > Most microprocessors fetch code in aligned 16-byte or 32-byte blocks.
> If an important subroutine entry or jump label happens to be near the
> end of a 16-byte block then the microprocessor will only get a few 
> useful bytes of code when fetching that block of code. It may have
> to fetch the next 16 bytes too before it can decode the first instructions
> after the label. This can be avoided by aligning important subroutine
> entries and loop entries by 16. Aligning by 8 will assure that at least 8
> bytes of code can be loaded with the first instruction fetch, which may
> be sufficient if the instructions are small.

  This looks like the reason behind the alignment. That section of the book
goes on to explain the inconvenience (execution of nops) of alignment on labels
reachable by other means than branching - which I presume lead to the :m and
:m2 tuning values, the distinction between -falign-labels and -falign-jumps,
and the reason padding is removed when my label is reachable by fall-through
with [[unlikely]].

  All this is fine. 

My thesis is that this alignment strategy is always unnecessary in one specific
circumstance - when the branch target is itself an unconditional branch of size
1, as in:

  .L1: 
      ret 

  Because the ret instruction will never cross a block boundary, and the
instructions following the ret must not execute, so there is no front-end stall
to avoid.

[Bug middle-end/110724] Unnecessary alignment on branch to unconditional branch targets

Reply via email to