http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57890
Bug ID: 57890 Summary: gcc 4.8.1 regression: loops become slower Product: gcc Version: 4.8.1 Status: UNCONFIRMED Severity: major Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: dushistov at mail dot ru $cat what_test.cpp char c[100]; void f(void) { for(int i=0; i < 100; ++i) c[i] = '0'; } I run this test with: cat test.cpp #include <cstddef> extern void f(); int main() { for (size_t i = 0; i < 100000000; ++i) f(); } compile with "-march=native -O3" on (i7 64bit mode). Here is result: for test_loop47 we get average 0.348000 for test_loop481 we get average 0.400000 If compare generated code then on 4.7 "f" function is transformed to: push %rbp vmovdqa 0x107(%rip),%ymm0 movb $0x30,0x200aa0(%rip) movb $0x30,0x200a9a(%rip) mov %rsp,%rbp vmovdqa %ymm0,0x200a2e(%rip) ... on gcc 4.8.1: movabs $0x3030303030303030,%rax movl $0x30303030,0x200a9c(%rip) mov %rax,0x200a35(%rip) mov %rax,0x200a36(%rip) ... PS I just checked may be http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55953 fixed in gcc 4.8.1, and yes it indeed "fixed", instead of optimal for loops and not optimal for builtin_memset it now produces not the same not optimal code for both cases.