------- Comment #2 from herumi at nifty dot com  2010-02-05 17:20 -------
>You should split your application into files that are compiled with either
-msse2 or -msse4. Using -msse4, you will get what you asked for.

I see, but according to Intel 64 and IA-32 Architectures Optimization
Reference Manual (http://www.intel.com/Assets/PDF/manual/248966.pdf),
their throughput and latency are the following:

CPU1: 06_{1ah,1eh,1fh,2eh} family
CPU2: 06_{17,1d}
                       latency   throughtput
                       CPU1 CPU2 CPU1 CPU2
pextrd reg, xmm1, imm    3   5   1    1      ; p.C-5
movd  r32, xmm           1   1   0.33 0.33   ; p.C-10
(see Table C-3 and Table C-6a in Appendix C)

movd is faster than pextrd, so I think gcc should use movd.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42968

Reply via email to