http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44141
Uros Bizjak <ubizjak at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |ra CC| |ubizjak at gmail dot com, | |uweigand at gcc dot | |gnu.org, vmakarov at gcc | |dot gnu.org Component|target |rtl-optimization --- Comment #10 from Uros Bizjak <ubizjak at gmail dot com> 2012-03-27 17:02:18 UTC --- (In reply to comment #9) > (In reply to comment #8) > > Created attachment 27013 [details] > > Simplied test case form ac.f90 > > GCC revision : 184502 > Command to reproduce: gfortran unoptimal_move.f90 -S -march=bdver1 -Ofast -dP > > Unoptimal move patterns can be found by grepping the assembly as follows: > > grep *movv4sf_internal unoptimal_move.s > > # (reg:V4SF 21 xmm0)) test.f90:16 1100 {*movv4sf_internal} > vmovaps %xmm0, (%rsp) # 393 *movv4sf_internal/3 [length = 5] > # (reg:V4SF 21 xmm0)) test.f90:16 1100 {*movv4sf_internal} > vmovaps %xmm0, 32(%rsp) # 396 *movv4sf_internal/3 [length = 6] > # (reg:V4SF 21 xmm0)) test.f90:16 1100 {*movv4sf_internal} > vmovaps %xmm0, 64(%rsp) # 399 *movv4sf_internal/3 [length = 6] > # (reg:V4SF 21 xmm0)) test.f90:16 1100 {*movv4sf_internal} > vmovaps %xmm0, 96(%rsp) # 402 *movv4sf_internal/3 [length = 6] This is register allocation / reload issue. MODES_TIEABLE_P and CANNOT_CHANGE_MODE_CLASS are correct (so V4SF and V2DF should be interchangeable). There are plenty of examples, where mode changes without problems in _.194r.reload, i.e. (insn 24 23 25 2 (set (reg:V4SF 51 xmm14 [282]) (unspec:V4SF [ (mem/c:V4SF (plus:DI (reg/f:DI 7 sp) (const_int 1432 [0x598])) [3 MEM[(real(kind=8)[26] *)&dsroo + 8B]+0 S16 A64]) ] UNSPEC_MOVU)) unoptimal_move.f90:16 1114 {*sse_movups} (nil)) (insn 25 24 348 2 (set (reg:V2DF 52 xmm15 [283]) (div:V2DF (reg:V2DF 51 xmm14 [282]) (mem/c:V2DF (plus:DI (reg/f:DI 7 sp) (const_int 1424 [0x590])) [3 MEM[(real(kind=8)[26] *)&dsroo]+0 S16 A128]))) unoptimal_move.f90:16 1154 {sse2_divv2df3} (nil)) But when the input and output operands of the division match, following sequence is generated: (insn 45 44 377 2 (set (reg:V4SF 21 xmm0) (unspec:V4SF [ (mem/c:V4SF (plus:DI (reg/f:DI 7 sp) (const_int 1544 [0x608])) [3 MEM[(real(kind=8)[26] *)&dsroo + 120B]+0 S16 A64]) ] UNSPEC_MOVU)) unoptimal_move.f90:16 1114 {*sse_movups} (nil)) (insn 377 45 379 2 (set (mem/c:V4SF (reg/f:DI 7 sp) [10 %sfp+-2112 S16 A128]) (reg:V4SF 21 xmm0)) unoptimal_move.f90:16 1108 {*movv4sf_internal} (nil)) (insn 379 377 46 2 (set (reg:V2DF 21 xmm0) (mem/c:V2DF (reg/f:DI 7 sp) [10 %sfp+-2112 S16 A128])) unoptimal_move.f90:16 1110 {*movv2df_internal} (nil)) (insn 46 379 378 2 (set (reg:V2DF 21 xmm0) (div:V2DF (reg:V2DF 21 xmm0) (mem/c:V2DF (plus:DI (reg/f:DI 7 sp) (const_int 1536 [0x600])) [3 MEM[(real(kind=8)[26] *)&dsroo + 112B]+0 S16 A128]))) unoptimal_move.f90:16 1154 {sse2_divv2df3} (nil)) Both sequences have the same starting sequence, from _193r.ira: (insn 24 23 25 2 (set (subreg:V4SF (reg:V2DF 282) 0) (unspec:V4SF [ (mem/c:V4SF (plus:DI (reg/f:DI 20 frame) (const_int -680 [0xfffffffffffffd58])) [3 MEM[(real(kind=8)[26] *)&dsroo + 8B]+0 S16 A64]) ] UNSPEC_MOVU)) unoptimal_move.f90:16 1114 {*sse_movups} (nil)) (insn 25 24 348 2 (set (reg:V2DF 283) (div:V2DF (reg:V2DF 282) (mem/c:V2DF (plus:DI (reg/f:DI 20 frame) (const_int -688 [0xfffffffffffffd50])) [3 MEM[(real(kind=8)[26] *)&dsroo]+0 S16 A128]))) unoptimal_move.f90:16 1154 {sse2_divv2df3} (nil)) The proposed fix would only put this issue under-the-rug. Reconfirmed as rtl-optimization problem.