--- Comment #6 from ubizjak at gmail dot com 2008-03-21 20:58 ---
Fixed.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13958
--- Comment #5 from ubizjak at gmail dot com 2008-03-21 20:58 ---
The inner loop is compiled (-O2 -march=pentium4 -malign-double) to:
.L4:
movl%ecx, %eax
andl$1, %eax
movla(,%eax,4), %eax
xorl%edx, %edx
(*)pushl %edx
(*)pushl %eax
--- Comment #4 from uros at gcc dot gnu dot org 2008-03-21 20:43 ---
Subject: Bug 13958
Author: uros
Date: Fri Mar 21 20:43:12 2008
New Revision: 133435
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=133435
Log:
PR target/13958
* config/i386/i386.md ("*floatunssi
--- Comment #3 from ubizjak at gmail dot com 2008-03-21 13:45 ---
This is due to partial memory access penalty. For TARGET_INTER_UNIT_MOVES, we
can create:
movda(,%eax,4), %xmm0
movq%xmm0, (%esp)
fildll (%esp)
instead of:
movla(,%eax,4), %e