http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231
--- Comment #11 from Thiago Macieira <thiago at kde dot org> 2012-08-13
10:12:48 UTC ---
Attaching __attribute__((target("xxx"))) to the function does help.
It generates the following with the my_bzero function from comment 2:
00000000000002e0 <bzero_avx.2362>:
2e0: test %rsi,%rsi
2e3: vpxor %xmm0,%xmm0,%xmm0
2e7: je 2fe <bzero_avx.2362+0x1e>
2e9: nopl 0x0(%rax)
2f0: vmovntdq %xmm0,(%rdi)
2f4: add $0x10,%rdi
2f8: sub $0x1,%rsi
2fc: jne 2f0 <bzero_avx.2362+0x10>
2fe: repz retq
0000000000000300 <my_bzero>:
300: mov 0x200171(%rip),%rax # 200478 <my_bzero+0x200178>
307: mov (%rax),%eax
309: test %eax,%eax
30b: jne 330 <my_bzero+0x30>
30d: test %rsi,%rsi
310: pxor %xmm0,%xmm0
314: je 332 <my_bzero+0x32>
316: nopw %cs:0x0(%rax,%rax,1)
320: movntdq %xmm0,(%rdi)
324: add $0x10,%rdi
328: sub $0x1,%rsi
32c: jne 320 <my_bzero+0x20>
32e: repz retq
330: jmp 2e0 <bzero_avx.2362>
332: repz retq
This workaround might be useful for me in a few places where the code inlining
provided by LTO was desired (even though, in this example, the AVX variant is
exactly what it would be if no LTO had been used). But it won't work without
major changes to the code if I have 400+ functions in a file, plus possibly
inlines from headers, to be compiled.