The testcase: --cut here-- #define N 256 int b[N];
void test() { int i; for (i = 0; i < N; i++) b[i] = 0; } --cut here-- compiles with '-O2 -msse2 -ftree-vectorize' into: test: movl $16, %eax pxor %xmm0, %xmm0 movdqa %xmm0, b .p2align 4,,7 .L2: pxor %xmm0, %xmm0 movdqa %xmm0, b(%eax) addl $16, %eax cmpl $1024, %eax jne .L2 rep ; ret Please note second pxor that is _not_ needed. Also, it should be moved out of loop as it is loop invariant. For slightly different testcase, where 'b[i] = 1' (or anything != 0) we get optimized code: test: movl $16, %eax movdqa .LC0, %xmm0 movdqa %xmm0, b .p2align 4,,7 .L2: movdqa %xmm0, b(%eax) addl $16, %eax cmpl $1024, %eax jne .L2 rep ; ret It looks like (g)cse doesn't know what 'xor N,N' means. -- Summary: Register zeroing by xor N,N should be moved out of loop Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: ubizjak at gmail dot com GCC host triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30970