inefficient memset

ak at muc dot de Wed, 04 Jul 2007 17:22:01 -0700

gcc version 4.3.0 20070704 (experimental)

test case derived from linux kernel


struct x { 
        long a,b,c,d,e,f;
        char array[32];
};

void f(struct x *p)
{
        p->a = 0;
        p->b = 0;
        p->c = 0;
        p->d = 0;
        p->e = 0;
        p->f = 0;
        memset(&p->array, 0, 32);
}

compiled with -O2 or -Os gives
        movq    $0, (%rdi)
        movq    $0, 8(%rdi)
        movl    $8, %ecx
        movq    $0, 16(%rdi)
        movq    $0, 24(%rdi)
        xorl    %eax, %eax
        movq    $0, 32(%rdi)
        movq    $0, 40(%rdi)
        addq    $48, %rdi
        rep
        stosl
        ret

This shows several problems:
- the zero in eax should have been used for all the initializations
replacing the immediate giving shorter code [especially with -Os,
but it would have been a win even with -O2 e.g on decode limited CPUs]
In a more test complex case the xorl was even before all the field
initializations.

- the rep ; stosl should be rep ; stosq because 8 byte alignment is guaranteed
by the type


-- 
           Summary: missing CSE for constant in registers / inefficient
                    memset
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: ak at muc dot de
  GCC host triplet: x86_64-linux
GCC target triplet: x86_64-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32629

[Bug rtl-optimization/32629] New: missing CSE for constant in registers / inefficient memset

Reply via email to