On Thu, Apr 16, 2026 at 12:58 PM Richard Biener <[email protected]> wrote:
>
> On Thu, 16 Apr 2026, Uros Bizjak wrote:
>
> > Hello!
> >
> > After pass_reorder_blocks, there remain some propagating opportunities
> > for late_combine. Looking at gcc.target/i386/pr90178.c, we get a
> > trivial sequence of:
> >
> > gcc -O2 -mavx -mvzeroupper -m32:
> >
> > .L5:
> > xorl %ecx, %ecx
> > ...
> > movl %ecx, %eax
> > ret
> >
> > Putting another instance of pass_late_combine after
> > pass_reorder_blocks improves the assembly in a non-trivial way:
> >
> > @@ -28,10 +28,8 @@
> > cmpl %edx, %ebx
> > je .L5
> > .L4:
> > - movl %eax, %ecx
> > cmpl %esi, (%eax)
> > jne .L11
> > - movl %ecx, %eax
> > popl %ebx
> > .cfi_remember_state
> > .cfi_restore 3
> > @@ -44,17 +42,16 @@
> > .p2align 3
> > .L5:
> > .cfi_restore_state
> > - xorl %ecx, %ecx
> > + xorl %eax, %eax
> > popl %ebx
> > .cfi_restore 3
> > .cfi_def_cfa_offset 8
> > popl %esi
> > .cfi_restore 6
> > .cfi_def_cfa_offset 4
> > - movl %ecx, %eax
> > ret
> > .cfi_endproc
> > .LFE0:
> > .size find_ptr, .-find_ptr
> >
> > which looks like it is worth putting a new pass here.
> >
> > A comparison of sizes of default x86_64 linux build shows noticeable
> > code size improvement:
> >
> > $ size vmlinux-old.o vmlinux-new.o
> > text data bss dec hex filename
> > 29432351 4932443 754228 35119022 217dfae vmlinux-old.o
> > 29415516 4932443 754228 35102187 2179deb vmlinux-new.o
> >
> > which shows a code size reduction of 16835 bytes.
> >
> > Any thoughts?
>
> Did you check other places to schedule the pass?
I was interested to exercise opportunities, exposed by bbro pass (as
mentioned in [1]), so the natural place to put the new pass is after
bbro pass:
On x86_32, IRA zeroes %ecx, which is later copied to %eax in the
terminal basic block:
12: NOTE_INSN_BASIC_BLOCK 3
7: cx:SI=0
REG_EQUAL 0
45: pc=L36
...
36: L36:
39: NOTE_INSN_BASIC_BLOCK 7
37: ax:SI=cx:SI
38: use ax:SI
This sequence is reordered in bbro pass to:
28: L28:
12: NOTE_INSN_BASIC_BLOCK 7
69: {cx:SI=0;clobber flags:CC;}
REG_UNUSED flags:CC
71: ax:SI=cx:SI
REG_DEAD cx:SI
72: use ax:SI
73: NOTE_INSN_EPILOGUE_BEG
74: bx:SI=[sp:SI++]
REG_CFA_ADJUST_CFA sp:SI=sp:SI+0x4
REG_CFA_RESTORE bx:SI
75: si:SI=[sp:SI++]
REG_CFA_ADJUST_CFA sp:SI=sp:SI+0x4
REG_CFA_RESTORE si:SI
76: simple_return
> Did you try moving the existing postreload late_combine later?
While moving the pass results in the same code for the above trivial
testcase, it regressed linux code size considerably:
$size *.o
text data bss dec hex filename
29483880 4932443 754228 35170551 218a8f7 vmlinux-moved.o
29415516 4932443 754228 35102187 2179deb vmlinux-new.o
29432351 4932443 754228 35119022 217dfae vmlinux-old.o
The postreload late_combine pass apparently frees some registers, so
follow-up passes can use them. This is not the case when the pass is
simply moved to the new location. The instructions are still combined
when the pass is moved after bbro pass, but used registers are "dead"
for the preceding optimization passes.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2026-April/713048.html
Uros.