Hello!
After pass_reorder_blocks, there remain some propagating opportunities
for late_combine. Looking at gcc.target/i386/pr90178.c, we get a
trivial sequence of:
gcc -O2 -mavx -mvzeroupper -m32:
.L5:
xorl %ecx, %ecx
...
movl %ecx, %eax
ret
Putting another instance of pass_late_combine after
pass_reorder_blocks improves the assembly in a non-trivial way:
@@ -28,10 +28,8 @@
cmpl %edx, %ebx
je .L5
.L4:
- movl %eax, %ecx
cmpl %esi, (%eax)
jne .L11
- movl %ecx, %eax
popl %ebx
.cfi_remember_state
.cfi_restore 3
@@ -44,17 +42,16 @@
.p2align 3
.L5:
.cfi_restore_state
- xorl %ecx, %ecx
+ xorl %eax, %eax
popl %ebx
.cfi_restore 3
.cfi_def_cfa_offset 8
popl %esi
.cfi_restore 6
.cfi_def_cfa_offset 4
- movl %ecx, %eax
ret
.cfi_endproc
.LFE0:
.size find_ptr, .-find_ptr
which looks like it is worth putting a new pass here.
A comparison of sizes of default x86_64 linux build shows noticeable
code size improvement:
$ size vmlinux-old.o vmlinux-new.o
text data bss dec hex filename
29432351 4932443 754228 35119022 217dfae vmlinux-old.o
29415516 4932443 754228 35102187 2179deb vmlinux-new.o
which shows a code size reduction of 16835 bytes.
Any thoughts?
BR,
Uros.
diff --git a/gcc/passes.def b/gcc/passes.def
index cdddb87302f..a04414bbe31 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -543,6 +543,7 @@ along with GCC; see the file COPYING3. If not see
NEXT_PASS (pass_cprop_hardreg);
NEXT_PASS (pass_fast_rtl_dce);
NEXT_PASS (pass_reorder_blocks);
+ NEXT_PASS (pass_late_combine);
NEXT_PASS (pass_leaf_regs);
NEXT_PASS (pass_split_before_sched2);
NEXT_PASS (pass_sched2);