On Wed, May 13, 2020 at 2:37 PM H.J. Lu <hjl.to...@gmail.com> wrote:
>
> On Wed, May 13, 2020 at 5:04 AM Uros Bizjak <ubiz...@gmail.com> wrote:
> >
> > On Wed, May 13, 2020 at 1:05 PM Uros Bizjak <ubiz...@gmail.com> wrote:
> > >
> > > On Tue, May 12, 2020 at 10:07 PM H.J. Lu <hjl.to...@gmail.com> wrote:
> > > >
> > > > Update STV pass to properly count cost of XMM register push.  In 32-bit
> > > > mode, to convert XMM register push in DImode, we do an XMM store in
> > > > DImode, followed by 2 memory pushes in SImode, instead of 2 integer
> > > > register pushes in SImode.  To convert XM register push in SImode, we
> > > > do an XMM register to integer register move in SImode, followed an
> > > > integer register push in SImode, instead of an integer register push in
> > > > SImode.  In 64-bit mode, we do an XMM register to integer register move
> > > > in SImode or DImode, followed an integer register push in SImode or
> > > > DImode, instead of an integer register push SImode or DImode.
> > > >
> > > > Tested on Linux/x86 and Linux/x86-64.
> > >
> > > I think it is better to implement XMM register pushes, and split them
> > > after reload to a sequence of:
> > >
> > > (set (reg:P SP_REG) (plus:P SP_REG) (const_int -8)))
> > > (set (match_dup 0) (match_dup 1))
> > >
> > > This is definitely better than trips through memory to stack.
> >
> > Attached (untested patch) allows fake pushes from XMM registers, so
> > STV pass can allow pushes.
>
> The problem isn't STV pass.  The IRA pass won't assign hard register for

Please see the sequence before STV pass:

(insn 24 23 25 3 (set (reg/v:DI 85 [ target ])
        (mem:DI (reg/f:SI 86 [ target_p ]) [2 *target_p.0_1+0 S8
A32])) "pr95021.c":17:7 64 {*movdi_internal}
     (expr_list:REG_DEAD (reg/f:SI 86 [ target_p ])
        (nil)))

(insn 26 25 27 3 (set (mem:DI (reg/f:SI 87 [ c ]) [2 c.1_2->target+0 S8 A32])
        (reg/v:DI 85 [ target ])) "pr95021.c":18:15 64 {*movdi_internal}
     (expr_list:REG_DEAD (reg/f:SI 87 [ c ])
        (nil)))

(insn 28 27 29 3 (set (mem:DI (pre_dec:SI (reg/f:SI 7 sp)) [2  S8 A64])
        (reg/v:DI 85 [ target ])) "pr95021.c":19:5 40 {*pushdi2}
     (expr_list:REG_DEAD (reg/v:DI 85 [ target ])
        (expr_list:REG_ARGS_SIZE (const_int 16 [0x10])
            (nil))))

So, (reg 85) gets moved to a memory in (insn 26) and pushed to a stack
in (insn 28).

STV pass does this:

(insn 24 41 37 3 (set (subreg:V2DI (reg:DI 89) 0)
        (subreg:V2DI (reg:DI 91) 0)) "pr95021.c":17:7 1342 {movv2di_internal}
     (expr_list:REG_DEAD (reg/f:SI 86 [ target_p ])
        (nil)))

(insn 37 24 38 3 (set (reg:V2DI 90)
        (subreg:V2DI (reg:DI 89) 0)) "pr95021.c":17:7 -1
     (nil))
(insn 38 37 39 3 (set (subreg:SI (reg/v:DI 85 [ target ]) 0)
        (subreg:SI (reg:V2DI 90) 0)) "pr95021.c":17:7 -1
     (nil))
(insn 39 38 40 3 (set (reg:V2DI 90)
        (lshiftrt:V2DI (reg:V2DI 90)
            (const_int 32 [0x20]))) "pr95021.c":17:7 -1
     (nil))
(insn 40 39 25 3 (set (subreg:SI (reg/v:DI 85 [ target ]) 4)
        (subreg:SI (reg:V2DI 90) 0)) "pr95021.c":17:7 -1
     (nil))

(insn 26 25 27 3 (set (mem:DI (reg/f:SI 87 [ c ]) [2 c.1_2->target+0 S8 A32])
        (reg:DI 89)) "pr95021.c":18:15 64 {*movdi_internal}
     (expr_list:REG_DEAD (reg/f:SI 87 [ c ])
        (nil)))

(insn 28 27 29 3 (set (mem:DI (pre_dec:SI (reg/f:SI 7 sp)) [2  S8 A64])
        (reg/v:DI 85 [ target ])) "pr95021.c":19:5 40 {*pushdi2}
     (expr_list:REG_DEAD (reg/v:DI 85 [ target ])
        (expr_list:REG_ARGS_SIZE (const_int 16 [0x10])
            (nil))))

For some reason (reg 89) moves to (reg 85) via sequence (insn 37) to
(insn 40), splitting to SImode and reassembling back to (reg 85).

>
> (insn 28 27 29 3 (set (mem:DI (pre_dec:SI (reg/f:SI 7 sp)) [2  S8 A64])
>         (reg/v:DI 85 [ target ])) "x.i":19:5 40 {*pushdi2}
>      (expr_list:REG_DEAD (reg/v:DI 85 [ target ])
>         (expr_list:REG_ARGS_SIZE (const_int 16 [0x10])
>             (nil))))
>
> and the reload pass turns into
>
> ....
> (insn 28 27 29 3 (set (mem:DI (pre_dec:SI (reg/f:SI 7 sp)) [2  S8 A64])
>         (mem/c:DI (plus:SI (reg/f:SI 7 sp)
>                 (const_int 16 [0x10])) [8 %sfp+-8 S8 A64])) "x.i":19:5
> 40 {*pushdi2}
>      (expr_list:REG_ARGS_SIZE (const_int 16 [0x10])
>         (nil)))

This is an optimization of the reload pass, it figures out where the
values live and tries its best to assign hard regs and stack slots to
the above convoluted sequence. Note, that DImode push from integer
regs splits to SImode pushes after reload. This has nothing with STV
pass.

The question is, why STV pass creates its funny sequence? The original
sequence should be easily solved by storing DImode from XMM register
and (with patched gcc) pushing DImode value from the same XMM
register.

Uros.

Reply via email to