gcc version 4.3.0 20070704 (experimental) test case derived from linux kernel
struct x { long a,b,c,d,e,f; char array[32]; }; void f(struct x *p) { p->a = 0; p->b = 0; p->c = 0; p->d = 0; p->e = 0; p->f = 0; memset(&p->array, 0, 32); } compiled with -O2 or -Os gives movq $0, (%rdi) movq $0, 8(%rdi) movl $8, %ecx movq $0, 16(%rdi) movq $0, 24(%rdi) xorl %eax, %eax movq $0, 32(%rdi) movq $0, 40(%rdi) addq $48, %rdi rep stosl ret This shows several problems: - the zero in eax should have been used for all the initializations replacing the immediate giving shorter code [especially with -Os, but it would have been a win even with -O2 e.g on decode limited CPUs] In a more test complex case the xorl was even before all the field initializations. - the rep ; stosl should be rep ; stosq because 8 byte alignment is guaranteed by the type -- Summary: missing CSE for constant in registers / inefficient memset Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: ak at muc dot de GCC host triplet: x86_64-linux GCC target triplet: x86_64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32629