Redundant loads and stores created with the new -mtune=bdver1 target. BDVER1 is
optimized to generate packed single moves instead packed double/integer moves
to save 1 byte of space.
Here is the excerpt from the asm dump for ac.f90 benchmark in the Polyhedron
testsuite. Complete asm dump generated with -dP also attached.
vmovaps %xmm15, 304(%rsp) # 4985 *avx_movv4sf_internal/3 [length = 9]
#(insn 4987 4985 2838 ac.f90:503 (set (reg:V2DF 52 xmm15)
# (mem/c:V2DF (plus:DI (reg/f:DI 7 sp)
# (const_int 304 [0x130])) [16 %sfp+-37872 S16 A128])) 1031
{*avx_movv2df_internal} (nil))
vmovaps 304(%rsp), %xmm15 # 4987 *avx_movv2df_internal/2 [length
= 9]
#(insn 2838 4987 4986 ac.f90:503 (set (reg:V2DF 52 xmm15)
# (div:V2DF (reg:V2DF 52 xmm15)
# (mem:V2DF (plus:DI (reg/f:DI 7 sp)
# (const_int 32432 [0x7eb0])) [2 *vect_pdclroo.541_5123+0
S16 A128]))) 1100 {avx_divv2df3} (nil))
Comments from Uros:
You are changing V4SFmode to V2DF mode. Since this combination is not
allowed by MODES_TIEABLE_P (and/or CANNOT_CHANGE_MODE_CLASS), value
gets reloaded through the memory. You can perhaps experiment with
these two macros a bit.
--
Summary: Redundant loads and stores generated for AMD bdver1
target
Product: gcc
Version: 4.6.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: harsha dot jagasia at amd dot com
GCC build triplet: x86_64-unknown-linux-gnu
GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44142