http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57890

            Bug ID: 57890
           Summary: gcc 4.8.1 regression: loops become slower
           Product: gcc
           Version: 4.8.1
            Status: UNCONFIRMED
          Severity: major
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dushistov at mail dot ru

$cat what_test.cpp
char c[100];

void f(void)
{
        for(int i=0; i < 100; ++i)
                c[i] = '0';
}

I run this test with:
cat test.cpp
#include <cstddef>

extern void f();

int main()
{
    for (size_t i = 0; i < 100000000; ++i)
        f();
}

compile with "-march=native -O3" on (i7 64bit mode).

Here is result:
for test_loop47 we get average 0.348000
for test_loop481 we get average 0.400000

If compare generated code then on 4.7 "f" function is transformed to:
push   %rbp
vmovdqa 0x107(%rip),%ymm0
movb   $0x30,0x200aa0(%rip)
movb   $0x30,0x200a9a(%rip)
mov    %rsp,%rbp
vmovdqa %ymm0,0x200a2e(%rip)
...

on gcc 4.8.1:

movabs $0x3030303030303030,%rax
movl   $0x30303030,0x200a9c(%rip)
mov    %rax,0x200a35(%rip)
mov    %rax,0x200a36(%rip)
...


PS

I just checked may be 
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55953
fixed in gcc 4.8.1,
and yes it indeed "fixed", instead of optimal for loops and
not optimal for builtin_memset it now produces not the same not optimal code
for both cases.

Reply via email to