Compiling the function below with -Os -march=i686 -mtune=pentiumpro generates bigger code for 4.2 than for 4.0. The reason seems to be that 4.2 peels off one loop iteration.
typedef unsigned Tabs [10]; void TabZonk(Tabs tabs) { int i; for (i = 0; i < 10; ++i) tabs[i] = 0; } sdiff gcc-4.0.s gcc-4.2.s TabZonk: TabZonk: pushl %ebp pushl %ebp movl $1, %eax | movl $2, %eax movl %esp, %ebp movl %esp, %ebp movl 8(%ebp), %edx movl 8(%ebp), %edx > movl $0, (%edx) > .p2align 4,,15 .L2: .L2: movl $0, -4(%edx,%eax,4) | xorl %ecx, %ecx > movl %ecx, -4(%edx,%eax,4) incl %eax incl %eax cmpl $11, %eax cmpl $11, %eax jne .L2 jne .L2 popl %ebp popl %ebp ret ret -- Summary: code size increase with -Os Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu GCC host triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26251