https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92283
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |ra Target| |x86_64-*-* CC| |vmakarov at gcc dot gnu.org Component|tree-optimization |rtl-optimization --- Comment #21 from Richard Biener <rguenth at gcc dot gnu.org> --- So the bug is here: # results.f:473: & vkl(3,1)*vkl(3,3) # results.f:473: & vkl(3,1)*vkl(3,3) vfmadd231sd 8(%rsp), %xmm0, %xmm2 # %sfp, preph | vfmadd231sd %xmm7, %xmm0, %xmm2 # pretmp_8926 where the good case uses 8(%rsp) as operand and the bad case has a stray use of %xmm7. Suspicous is that %xmm7 was previously moved to a GPR in the bad case (but that's actually correct AFAICS): > vmovq %xmm7, %rsi # _874, _878 vmovsd %xmm7, 488(%rsp) # _878, eyy vmovsd %xmm7, 488(%rsp) # _878, eyy More context (you can see 8(%rsp) used in the good case twice, the bad case has 72(%rsp) for this in the first, loaded to %xmm7 but that reg is then clobbered in the following insn): vfmadd132sd %xmm6, %xmm12, %xmm7 # tmp2767, pr | vfmadd132sd %xmm6, %xmm12, %xmm7 # tmp2769, pr > vmovq %xmm7, %rsi # _874, _878 vmovsd %xmm7, 488(%rsp) # _878, eyy vmovsd %xmm7, 488(%rsp) # _878, eyy # results.f:469: ezz=ezz+(vkl(1,3)**2+vkl(2,3) # results.f:469: ezz=ezz+(vkl(1,3)**2+vkl(2,3) vmovsd 144(%rsp), %xmm15 # %sfp, pretmp_8932 vmovsd 144(%rsp), %xmm15 # %sfp, pretmp_8932 vmulsd %xmm15, %xmm15, %xmm5 # pretmp_8932, pretmp vmulsd %xmm15, %xmm15, %xmm5 # pretmp_8932, pretmp vmovsd 8(%rsp), %xmm9 # %sfp, pretmp_8926 | vmovsd 72(%rsp), %xmm7 # %sfp, pretmp_8926 vfmadd132sd %xmm9, %xmm5, %xmm9 #, _4016, pre | vfmadd132sd %xmm7, %xmm5, %xmm7 # pretmp_8926 vmovaps %xmm9, %xmm5 # pretmp_8926, _879 | vmovaps %xmm7, %xmm5 # pretmp_8926, _879 vfmadd231sd %xmm13, %xmm13, %xmm5 # pretmp_8918 vfmadd231sd %xmm13, %xmm13, %xmm5 # pretmp_8918 vfmadd132sd %xmm6, %xmm13, %xmm5 # tmp2767, pr | vfmadd132sd %xmm6, %xmm13, %xmm5 # tmp2769, pr vmovsd %xmm5, 504(%rsp) # _884, ezz vmovsd %xmm5, 504(%rsp) # _884, ezz # results.f:471: & vkl(3,1)*vkl(3,2) # results.f:471: & vkl(3,1)*vkl(3,2) vmovsd 80(%rsp), %xmm9 # %sfp, pretmp_8920 < vfmadd231sd %xmm9, %xmm0, %xmm4 # pretmp_8920 vfmadd231sd %xmm9, %xmm0, %xmm4 # pretmp_8920 vfmadd231sd %xmm14, %xmm12, %xmm4 # pretmp_8922 vfmadd231sd %xmm14, %xmm12, %xmm4 # pretmp_8922 vfmadd231sd %xmm11, %xmm10, %xmm4 # pretmp_8928 vfmadd231sd %xmm11, %xmm10, %xmm4 # pretmp_8928 vmovsd %xmm4, 472(%rsp) # _8924, exy vmovsd %xmm4, 472(%rsp) # _8924, exy # results.f:473: & vkl(3,1)*vkl(3,3) # results.f:473: & vkl(3,1)*vkl(3,3) vfmadd231sd 8(%rsp), %xmm0, %xmm2 # %sfp, preph | vfmadd231sd %xmm7, %xmm0, %xmm2 # pretmp_8926 vfmadd231sd %xmm15, %xmm14, %xmm2 # pretmp_8932 vfmadd231sd %xmm15, %xmm14, %xmm2 # pretmp_8932 vfmadd231sd %xmm13, %xmm11, %xmm2 # pretmp_8918 vfmadd231sd %xmm13, %xmm11, %xmm2 # pretmp_8918 vmovsd %xmm2, 480(%rsp) # _8930, exz vmovsd %xmm2, 480(%rsp) # _8930, exz before IRA the insns with the stack use and the later bogus reg use are (insn 1815 1814 1816 176 (set (reg:DF 246 [ _879 ]) (fma:DF (reg:DF 1447 [ pretmp_8926 ]) (reg:DF 1447 [ pretmp_8926 ]) (reg:DF 1018 [ _4016 ]))) "results.f":469:0 1960 {*fma_fmadd_df} (expr_list:REG_DEAD (reg:DF 1018 [ _4016 ]) (nil))) (insn 1825 1824 1826 176 (set (reg:DF 252 [ _902 ]) (fma:DF (reg:DF 1440 [ prephitmp_8903 ]) (reg:DF 1447 [ pretmp_8926 ]) (reg:DF 1449 [ _8930 ]))) "results.f":473:0 1960 {*fma_fmadd_df} (expr_list:REG_DEAD (reg:DF 1449 [ _8930 ]) (nil))) and after reload it's broken: (insn 10605 9622 1815 179 (set (reg:DF 27 xmm7 [orig:1447 pretmp_8926 ] [1447]) (mem/c:DF (plus:DI (reg/f:DI 7 sp) (const_int 72 [0x48])) [22 %sfp+-6808 S8 A64])) "results.f":469:0 111 {*movdf_internal} (nil)) (insn 1815 10605 10530 179 (set (reg:DF 27 xmm7 [orig:1447 pretmp_8926 ] [1447]) (fma:DF (reg:DF 27 xmm7 [orig:1447 pretmp_8926 ] [1447]) (reg:DF 27 xmm7 [orig:1447 pretmp_8926 ] [1447]) (reg:DF 25 xmm5 [orig:1018 _4016 ] [1018]))) "results.f":469:0 1960 {*fma_fmadd_df} (nil)) oops, %xmm7 clobbered by insn 1815! but re-used later (note 10338 1824 9630 179 NOTE_INSN_DELETED) (note 9630 10338 1825 179 NOTE_INSN_DELETED) (insn 1825 9630 1826 179 (set (reg:DF 22 xmm2 [orig:252 _902 ] [252]) (fma:DF (reg:DF 20 xmm0 [orig:1440 prephitmp_8903 ] [1440]) (reg:DF 27 xmm7 [orig:1447 pretmp_8926 ] [1447]) (reg:DF 22 xmm2 [orig:1449 _8930 ] [1449]))) "results.f":473:0 1960 {*fma_fmadd_df} (nil)) it looks like 10338 and 9630 were inserted (reloads maybe?) but then discarded: Use smallest class of ALL_SSE_REGS and SSE_REGS Creating newreg=4364 from oldreg=1447, assigning class ALL_SSE_REGS to inheritance r4364 Original reg change 1447->4364 (bb176): 9630: r3696:DF=r4364:DF Add inheritance<-original before: 10338: r4364:DF=r1447:DF Inheritance reuse change 1447->4372 (bb176): 10338: r4364:DF=r4372:DF Insn after restoring regs: 9630: r3696:DF=r1447:DF Removing inheritance: 10338: r4364:DF=r4372:DF REG_DEAD r4372:DF deleting insn with uid = 10338. so it looks like a reload inheritance issue to me. Vladimir? The testcase is results.f from 454.calculix, compile it with -O2 -mfma -mtune=znver2 -fdbg-cnt=ivopts_loop:66:67 -fno-schedule-insns2 -mno-stv -fno-tree-slsr -fno-tree-ter (-fdbg-cnt=ivopts_loop:66:66 yields correct code). The same issue probably appears when building it with -O2 -march=znver2 but the above debugging is with the "reduced" flags.