https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89654

            Bug ID: 89654
           Summary: Invalid reload with -march=skylake -m32
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ubizjak at gmail dot com
  Target Milestone: ---

Following testcase:

--cut here--
unsigned long long
foo (unsigned long long i)
{
  return i << 3;
}
--cut here--

compiles with -O2 -march=skylake -m32 to:

        subl    $28, %esp
        movl    32(%esp), %eax
        movl    36(%esp), %edx
        movl    %eax, (%esp)
        movl    %edx, 4(%esp)
        vmovdqa (%esp), %xmm1        <--- here
        addl    $28, %esp
        vpsllq  $3, %xmm1, %xmm0
        vmovd   %xmm0, %eax
        vpextrd $1, %xmm0, %edx
        ret

Please note 128bit access to a 64bit stack slot, in addition to unnecessary
moves.

In _.ira, we have:

(insn 2 4 3 2 (set (reg/v:DI 83 [ i ])
        (mem/c:DI (reg/f:SI 16 argp) [1 i+0 S8 A32])) "vshift.c":3:1 66
{*movdi_internal}
     (nil))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 14 2 (set (subreg:V2DI (reg:DI 84) 0)
        (ashift:V2DI (subreg:V2DI (reg/v:DI 83 [ i ]) 0)
            (const_int 3 [0x3]))) "vshift.c":4:12 3353 {ashlv2di3}
     (expr_list:REG_DEAD (reg/v:DI 83 [ i ])
        (nil)))
...

and in _.reload:

(insn 2 4 19 2 (set (reg/v:DI 0 ax [orig:83 i ] [83])
        (mem/c:DI (plus:SI (reg/f:SI 7 sp)
                (const_int 32 [0x20])) [1 i+0 S8 A32])) "vshift.c":3:1 66
{*movdi_internal}
     (nil))
(insn 19 2 3 2 (set (mem/c:DI (reg/f:SI 7 sp) [2 %sfp+-16 S8 A128])
        (reg/v:DI 0 ax [orig:83 i ] [83])) "vshift.c":3:1 66 {*movdi_internal}
     (nil))
(note 3 19 20 2 NOTE_INSN_FUNCTION_BEG)
(insn 20 3 6 2 (set (reg:V2DI 21 xmm1 [89])
        (mem/c:V2DI (reg/f:SI 7 sp) [2 %sfp+-16 S16 A128])) "vshift.c":4:12
1211 {movv2di_internal}
     (nil))
(insn 6 20 14 2 (set (reg:V2DI 20 xmm0 [84])
        (ashift:V2DI (reg:V2DI 21 xmm1 [89])
            (const_int 3 [0x3]))) "vshift.c":4:12 3353 {ashlv2di3}
     (nil))
...

Please note (insn 19) and (insn 20), where DImode value in a DImode stack slot
is loaded using V2DImode instruction.

Using -O2 -march=skylake-avx512 -m32, we get:

        subl    $28, %esp
        movl    32(%esp), %eax
        movl    36(%esp), %edx
        movl    %eax, (%esp)
        movl    %edx, 4(%esp)
        vpsllq  $3, (%esp), %xmm0
        addl    $28, %esp
        vmovd   %xmm0, %eax
        vpextrd $1, %xmm0, %edx
        ret

which is even more wrong, as V2DI move is propagated into the shift insn.

However, with -O2 -march=haswell -m32, everything works as expected:

        vmovq   4(%esp), %xmm0
        vpsllq  $3, %xmm0, %xmm0
        vmovd   %xmm0, %eax
        vpextrd $1, %xmm0, %edx
        ret

Reply via email to