https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82625
Bug ID: 82625 Summary: lower-optimization are not inlined with symbol multiversioning Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: uzytkownik2 at gmail dot com Target Milestone: --- Consider following toy example: __attribute__ ((target ("default"))) static uint32_t foo(const char *buf, size_t size) { return 1; } __attribute__ ((target ("avx"))) static uint32_t foo(const char *buf, size_t size) { return 2; } __attribute__ ((target ("default"))) uint32_t bar() { char buf[4096]; uint32_t acc = 0; for (int i = 0; i < sizeof(buf); i++) { acc += foo(&buf[i], 1); } return acc; } __attribute__ ((target ("avx"))) uint32_t bar() { char buf[4096]; uint32_t acc = 0; for (int i = 0; i < sizeof(buf); i++) { acc += foo(&buf[i], 1); } return acc; } bar.avx is correctly optimized to mov: bar() [clone .avx]: movl $8192, %eax ret However even though default bar could be optimized to mov as well it goes through loop and dispatch: bar(): pushq %r12 pushq %rbp xorl %ebp, %ebp pushq %rbx subq $4096, %rsp leaq 4096(%rsp), %r12 movq %rsp, %rbx .L10: movq %rbx, %rdi movl $1, %esi addq $1, %rbx call _ZL3fooPKcm._GLOBAL____tmp_compiler_explorer_compiler117919_59_b8onwy.b8iqhyqfr_example.cpp_00000000_0x82e640d209aabe90.ifunc(char const*, unsigned long) addl %eax, %ebp cmpq %r12, %rbx jne .L10 addq $4096, %rsp movl %ebp, %eax popq %rbx popq %rbp popq %r12 ret Possibly overlapping with bug #71990.