------- Comment #5 from hjl dot tools at gmail dot com 2009-04-21 20:34 ------- Created an attachment (id=17667) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17667&action=view) An example
I am enclosing a modified example which can be compiled with both icc and gcc. I also included assembly codes generated by "icc -O2" and "gcc -avx -O2". Icc generates: 54: c5 ff 7c c8 vhaddps %ymm0,%ymm0,%ymm1 58: c5 f7 7c d1 vhaddps %ymm1,%ymm1,%ymm2 5c: c5 ef 7c da vhaddps %ymm2,%ymm2,%ymm3 60: c5 fc 29 5c 24 e0 vmovaps %ymm3,-0x20(%rsp) 66: f3 0f 10 44 24 e0 movss -0x20(%rsp),%xmm0 for if (has_avx ()) { ... } There is f3 0f 10 44 24 e0 movss -0x20(%rsp),%xmm0 although this code will only run on AVX targets. Since we don't support basic block optimization, I don't see how we can avoid SSE instructions in AVX code path. The best option I can think of is function level optimization. But as we all know, function level optimization isn't usable, as least in this context. I think we should go back and another look at function level optimization. We should do it right this time. I have some ideas in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37565 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39840