[Bug middle-end/39840] Non-optimal (or wrong) implementation of SSE intrinsics

hjl dot tools at gmail dot com Tue, 21 Apr 2009 13:34:43 -0700


------- Comment #5 from hjl dot tools at gmail dot com  2009-04-21 20:34 -------
Created an attachment (id=17667)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17667&action=view)
An example


I am enclosing a modified example which can be compiled with both
icc and gcc. I also included assembly codes generated by "icc -O2"
and "gcc -avx -O2". Icc generates:

  54:   c5 ff 7c c8             vhaddps %ymm0,%ymm0,%ymm1
  58:   c5 f7 7c d1             vhaddps %ymm1,%ymm1,%ymm2
  5c:   c5 ef 7c da             vhaddps %ymm2,%ymm2,%ymm3
  60:   c5 fc 29 5c 24 e0       vmovaps %ymm3,-0x20(%rsp)
  66:   f3 0f 10 44 24 e0       movss  -0x20(%rsp),%xmm0

for

if (has_avx ())
 {
   ...
 }

There is

f3 0f 10 44 24 e0       movss  -0x20(%rsp),%xmm0

although this code will only run on AVX targets. Since we don't
support basic block optimization, I don't see how we can avoid
SSE instructions in AVX code path. The best option I can think
of is function level optimization. But as we all know, function
level optimization isn't usable, as least in this context. I
think we should go back and another look at function level
optimization. We should do it right this time. I have some
ideas in

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37565


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39840

[Bug middle-end/39840] Non-optimal (or wrong) implementation of SSE intrinsics

Reply via email to