http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44141

Uros Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |ra
                 CC|                            |ubizjak at gmail dot com,
                   |                            |uweigand at gcc dot
                   |                            |gnu.org, vmakarov at gcc
                   |                            |dot gnu.org
          Component|target                      |rtl-optimization

--- Comment #10 from Uros Bizjak <ubizjak at gmail dot com> 2012-03-27 17:02:18 
UTC ---
(In reply to comment #9)
> (In reply to comment #8)
> > Created attachment 27013 [details]
> > Simplied test case form ac.f90
> 
> GCC revision : 184502
> Command to reproduce:  gfortran unoptimal_move.f90 -S -march=bdver1 -Ofast -dP
> 
> Unoptimal move patterns can be found by grepping the assembly as follows:
> 
> grep  *movv4sf_internal unoptimal_move.s
> 
> #        (reg:V4SF 21 xmm0)) test.f90:16 1100 {*movv4sf_internal}
>         vmovaps %xmm0, (%rsp)   # 393   *movv4sf_internal/3     [length = 5]
> #        (reg:V4SF 21 xmm0)) test.f90:16 1100 {*movv4sf_internal}
>         vmovaps %xmm0, 32(%rsp) # 396   *movv4sf_internal/3     [length = 6]
> #        (reg:V4SF 21 xmm0)) test.f90:16 1100 {*movv4sf_internal}
>         vmovaps %xmm0, 64(%rsp) # 399   *movv4sf_internal/3     [length = 6]
> #        (reg:V4SF 21 xmm0)) test.f90:16 1100 {*movv4sf_internal}
>         vmovaps %xmm0, 96(%rsp) # 402   *movv4sf_internal/3     [length = 6]

This is register allocation / reload issue. MODES_TIEABLE_P and
CANNOT_CHANGE_MODE_CLASS are correct (so V4SF and V2DF should be
interchangeable). There are plenty of examples, where mode changes without
problems in _.194r.reload, i.e.

(insn 24 23 25 2 (set (reg:V4SF 51 xmm14 [282])
        (unspec:V4SF [
                (mem/c:V4SF (plus:DI (reg/f:DI 7 sp)
                        (const_int 1432 [0x598])) [3 MEM[(real(kind=8)[26]
*)&dsroo + 8B]+0 S16 A64])
            ] UNSPEC_MOVU)) unoptimal_move.f90:16 1114 {*sse_movups}
     (nil))

(insn 25 24 348 2 (set (reg:V2DF 52 xmm15 [283])
        (div:V2DF (reg:V2DF 51 xmm14 [282])
            (mem/c:V2DF (plus:DI (reg/f:DI 7 sp)
                    (const_int 1424 [0x590])) [3 MEM[(real(kind=8)[26]
*)&dsroo]+0 S16 A128]))) unoptimal_move.f90:16 1154 {sse2_divv2df3}
     (nil))

But when the input and output operands of the division match, following
sequence is generated:

(insn 45 44 377 2 (set (reg:V4SF 21 xmm0)
        (unspec:V4SF [
                (mem/c:V4SF (plus:DI (reg/f:DI 7 sp)
                        (const_int 1544 [0x608])) [3 MEM[(real(kind=8)[26]
*)&dsroo + 120B]+0 S16 A64])
            ] UNSPEC_MOVU)) unoptimal_move.f90:16 1114 {*sse_movups}
     (nil))

(insn 377 45 379 2 (set (mem/c:V4SF (reg/f:DI 7 sp) [10 %sfp+-2112 S16 A128])
        (reg:V4SF 21 xmm0)) unoptimal_move.f90:16 1108 {*movv4sf_internal}
     (nil))

(insn 379 377 46 2 (set (reg:V2DF 21 xmm0)
        (mem/c:V2DF (reg/f:DI 7 sp) [10 %sfp+-2112 S16 A128]))
unoptimal_move.f90:16 1110 {*movv2df_internal}
     (nil))

(insn 46 379 378 2 (set (reg:V2DF 21 xmm0)
        (div:V2DF (reg:V2DF 21 xmm0)
            (mem/c:V2DF (plus:DI (reg/f:DI 7 sp)
                    (const_int 1536 [0x600])) [3 MEM[(real(kind=8)[26] *)&dsroo
+ 112B]+0 S16 A128]))) unoptimal_move.f90:16 1154 {sse2_divv2df3}
     (nil))

Both sequences have the same starting sequence, from _193r.ira:

(insn 24 23 25 2 (set (subreg:V4SF (reg:V2DF 282) 0)
        (unspec:V4SF [
                (mem/c:V4SF (plus:DI (reg/f:DI 20 frame)
                        (const_int -680 [0xfffffffffffffd58])) [3
MEM[(real(kind=8)[26] *)&dsroo + 8B]+0 S16 A64])
            ] UNSPEC_MOVU)) unoptimal_move.f90:16 1114 {*sse_movups}
     (nil))

(insn 25 24 348 2 (set (reg:V2DF 283)
        (div:V2DF (reg:V2DF 282)
            (mem/c:V2DF (plus:DI (reg/f:DI 20 frame)
                    (const_int -688 [0xfffffffffffffd50])) [3
MEM[(real(kind=8)[26] *)&dsroo]+0 S16 A128]))) unoptimal_move.f90:16 1154
{sse2_divv2df3}
     (nil))

The proposed fix would only put this issue under-the-rug.

Reconfirmed as rtl-optimization problem.

Reply via email to