Hello!

After pass_reorder_blocks, there remain some propagating opportunities
for late_combine.  Looking at gcc.target/i386/pr90178.c, we get a
trivial sequence of:

gcc -O2 -mavx -mvzeroupper -m32:

.L5:
    xorl    %ecx, %ecx
    ...
    movl    %ecx, %eax
    ret

Putting another instance of pass_late_combine after
pass_reorder_blocks improves the assembly in a non-trivial way:

 @@ -28,10 +28,8 @@
     cmpl    %edx, %ebx
     je    .L5
 .L4:
-    movl    %eax, %ecx
     cmpl    %esi, (%eax)
     jne    .L11
-    movl    %ecx, %eax
     popl    %ebx
     .cfi_remember_state
     .cfi_restore 3
@@ -44,17 +42,16 @@
     .p2align 3
 .L5:
     .cfi_restore_state
-    xorl    %ecx, %ecx
+    xorl    %eax, %eax
     popl    %ebx
     .cfi_restore 3
     .cfi_def_cfa_offset 8
     popl    %esi
     .cfi_restore 6
     .cfi_def_cfa_offset 4
-    movl    %ecx, %eax
     ret
     .cfi_endproc
 .LFE0:
     .size    find_ptr, .-find_ptr

which looks like it is worth putting a new pass here.

A comparison of sizes of default x86_64 linux build shows noticeable
code size improvement:

$ size vmlinux-old.o vmlinux-new.o
  text    data     bss     dec     hex filename
29432351        4932443  754228 35119022        217dfae vmlinux-old.o
29415516        4932443  754228 35102187        2179deb vmlinux-new.o

which shows a code size reduction of 16835 bytes.

Any thoughts?

BR,
Uros.
diff --git a/gcc/passes.def b/gcc/passes.def
index cdddb87302f..a04414bbe31 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -543,6 +543,7 @@ along with GCC; see the file COPYING3.  If not see
          NEXT_PASS (pass_cprop_hardreg);
          NEXT_PASS (pass_fast_rtl_dce);
          NEXT_PASS (pass_reorder_blocks);
+         NEXT_PASS (pass_late_combine);
          NEXT_PASS (pass_leaf_regs);
          NEXT_PASS (pass_split_before_sched2);
          NEXT_PASS (pass_sched2);

Reply via email to