http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57890
Bug ID: 57890
Summary: gcc 4.8.1 regression: loops become slower
Product: gcc
Version: 4.8.1
Status: UNCONFIRMED
Severity: major
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: dushistov at mail dot ru
$cat what_test.cpp
char c[100];
void f(void)
{
for(int i=0; i < 100; ++i)
c[i] = '0';
}
I run this test with:
cat test.cpp
#include <cstddef>
extern void f();
int main()
{
for (size_t i = 0; i < 100000000; ++i)
f();
}
compile with "-march=native -O3" on (i7 64bit mode).
Here is result:
for test_loop47 we get average 0.348000
for test_loop481 we get average 0.400000
If compare generated code then on 4.7 "f" function is transformed to:
push %rbp
vmovdqa 0x107(%rip),%ymm0
movb $0x30,0x200aa0(%rip)
movb $0x30,0x200a9a(%rip)
mov %rsp,%rbp
vmovdqa %ymm0,0x200a2e(%rip)
...
on gcc 4.8.1:
movabs $0x3030303030303030,%rax
movl $0x30303030,0x200a9c(%rip)
mov %rax,0x200a35(%rip)
mov %rax,0x200a36(%rip)
...
PS
I just checked may be
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55953
fixed in gcc 4.8.1,
and yes it indeed "fixed", instead of optimal for loops and
not optimal for builtin_memset it now produces not the same not optimal code
for both cases.