https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89654
Bug ID: 89654 Summary: Invalid reload with -march=skylake -m32 Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ubizjak at gmail dot com Target Milestone: --- Following testcase: --cut here-- unsigned long long foo (unsigned long long i) { return i << 3; } --cut here-- compiles with -O2 -march=skylake -m32 to: subl $28, %esp movl 32(%esp), %eax movl 36(%esp), %edx movl %eax, (%esp) movl %edx, 4(%esp) vmovdqa (%esp), %xmm1 <--- here addl $28, %esp vpsllq $3, %xmm1, %xmm0 vmovd %xmm0, %eax vpextrd $1, %xmm0, %edx ret Please note 128bit access to a 64bit stack slot, in addition to unnecessary moves. In _.ira, we have: (insn 2 4 3 2 (set (reg/v:DI 83 [ i ]) (mem/c:DI (reg/f:SI 16 argp) [1 i+0 S8 A32])) "vshift.c":3:1 66 {*movdi_internal} (nil)) (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) (insn 6 3 14 2 (set (subreg:V2DI (reg:DI 84) 0) (ashift:V2DI (subreg:V2DI (reg/v:DI 83 [ i ]) 0) (const_int 3 [0x3]))) "vshift.c":4:12 3353 {ashlv2di3} (expr_list:REG_DEAD (reg/v:DI 83 [ i ]) (nil))) ... and in _.reload: (insn 2 4 19 2 (set (reg/v:DI 0 ax [orig:83 i ] [83]) (mem/c:DI (plus:SI (reg/f:SI 7 sp) (const_int 32 [0x20])) [1 i+0 S8 A32])) "vshift.c":3:1 66 {*movdi_internal} (nil)) (insn 19 2 3 2 (set (mem/c:DI (reg/f:SI 7 sp) [2 %sfp+-16 S8 A128]) (reg/v:DI 0 ax [orig:83 i ] [83])) "vshift.c":3:1 66 {*movdi_internal} (nil)) (note 3 19 20 2 NOTE_INSN_FUNCTION_BEG) (insn 20 3 6 2 (set (reg:V2DI 21 xmm1 [89]) (mem/c:V2DI (reg/f:SI 7 sp) [2 %sfp+-16 S16 A128])) "vshift.c":4:12 1211 {movv2di_internal} (nil)) (insn 6 20 14 2 (set (reg:V2DI 20 xmm0 [84]) (ashift:V2DI (reg:V2DI 21 xmm1 [89]) (const_int 3 [0x3]))) "vshift.c":4:12 3353 {ashlv2di3} (nil)) ... Please note (insn 19) and (insn 20), where DImode value in a DImode stack slot is loaded using V2DImode instruction. Using -O2 -march=skylake-avx512 -m32, we get: subl $28, %esp movl 32(%esp), %eax movl 36(%esp), %edx movl %eax, (%esp) movl %edx, 4(%esp) vpsllq $3, (%esp), %xmm0 addl $28, %esp vmovd %xmm0, %eax vpextrd $1, %xmm0, %edx ret which is even more wrong, as V2DI move is propagated into the shift insn. However, with -O2 -march=haswell -m32, everything works as expected: vmovq 4(%esp), %xmm0 vpsllq $3, %xmm0, %xmm0 vmovd %xmm0, %eax vpextrd $1, %xmm0, %edx ret